IMPERIAL AGRICULTURAL 
RESEARCH INSTITUTE, NEW DELHI. 










THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 

The Annals of Mathematical Statistics Is Affiliated 
with the American Statistical Association and Is 
Devoted to the Theory and Application of 
Mathematical Statistics 


Editorial Committee 
H. C. CARVER 
A. L. O’TOOLE 
T. E. KAIFORD 


Volume VI, 1935 


PUBLISHED QUARTERLY 
ANN ARBOR, MICHIGAN 



The Annals is not copyrighted: any articles or tables appearing therein may 
be reproduced in whole or in part at any time if accompanied by 
the proper reference to this publication 


Four Dollars per annum 


Made in United States of America 


Address: Annals of Mathematical Statistics 
Post Office Box 171, Ann Arbor, Michigan 


Composed and Printed at the 
WAVERLY PRESS, Inc. 
Baltimore, Md. 



^ / R F? & H 

CONTENTS OF VOLUME VI ^ 


Some Interesting Features of Frequency Curves. Richmond T. Zoch. ... 1 

A Reconsideration of Sheppard's Corrections. W. T. Lewis. 11 

The Point Binomial and Probability Paper. Frank H. Byron. 21 

Inequalities Among Averages. Nilan Norris . 27 

Mathematical Expectation of Product Moments of Samples Drawn from 

a Set of Infinite Populations. Hyman M. Feldman. 30 

An Application of Orthogonalization Process to the Theory of Least 

Squares. Dr. Y. K. Wong. 53 

A Note on the Analysis of Variance. Solomon Ivullback. 76 

Problem Involving the Lexis Theory of Dispersion. Walter A. Hen¬ 
dricks. 78 

A Method for Determining the Coefficients of a Characteristic Equation. 

Paul Horst. 83 

The Generalized Problem of Correct Matchings. Dwight W. Chapman 85 
Moments About the Arithmetic Mean of a Binomial Frequency Distribu¬ 
tion. W. J. Kirkham. 96 

On Certain Distribution Functions when the Law of the Universe is Pois¬ 
son's First Law of Error. Frank M. Weida. 102 

On the Problem of Confidence Intervals. J. Neyman . Ill 

Analysis of Variance Considered as an Application of Simple Error Theory. 

Walter A. Hendricks. 117 

Note on the Distributions of the Standard 1 leviations and Second Moments 

of Samples from a Gram-Charlier Population. G. A. Baker. 127 

On the Finite Differences of a Polynomial. I. H. Barkey. 131 

Some Practical Interpolation Formulas. John L. Roberts. 133 

On Evaluating a Coefficient of Partial Correlation. Grace Streckeii. .. 143 
A Theory of Validation for Derivative Specifications and Check Lists. 

Lee Byrne. 146 

A Note on Sheppard's Corrections. Solomon Kullback.158 

The Limiting Distributions of Certain Statistics. J. L. Doob .160 

On the Postulate of the Arithmetic Mean. Richmond T. Zoch. .. 171 

The Shrinkage of the Brown-Spearman Prophecy Formula. Robert J. 

Wherry. 183 

The Likelihood Test of Independence in Contingency Tables. S. S. 

Wilks. 190 

'Pje Probability that the Mean of a Second Sample Will Differ from the 
^ Mea* jf^a First Sample by Less than a Certain Multiple of the Stand¬ 
ard Dr ' ation of the First Sample. G. A. Baker. 197 























CONTENTS 


On Samples from a Multivariate Normal Population. Solomon 

Kullhack.202 

On a Criterion for the Rejection of Observations and the Distribution of the 
Ratio of Deviation to Sample Standard Deviation. William R. 

Thompson .214 

On Certain Coefficients Used in Mathematical Statistics. Va erett H. 

Larguieh .220 

Notice of the Organization of the Institute of Mathematical Statistics. . . . 227 
1935 Directory of Subscribers to the Annals of Mathematical Statistics.... 228 







SOME INTERESTING FEATURES OF FREQUENCY CURVES 

By Richmond T. Zoch 

Introduction 

It is well known that in the normal error curve the points of inflection are 
equidistant from the mode. However it has never been pointed out that this is 
also a characteristic of all of the bell-shaped Pearson Frequency Curves. This 
fact can be most easily shown by placing the mode at the abscissa x = 0. 

Many rough checks have been developed for use in applying the Theory of 
Least Squares. The second part of this paper develops a rough check on the 
computation for use when fitting a Pearson Frequency Curve to a set of observa¬ 
tions. No rough checks on computation are given in textbooks on Pearson's 
Frequency Curves. 

At present it is customary to follow a separate procedure for each Type of 
curve when computing the constants of a Pearson Frequency Curve. The 
third part of this paper shows how a single system may be followed for all Types. 
A single procedure is very desirable in order that the rough check of Part 2 may 
be quickly applied. 


Part 1. Points of Inflection 

Perhaps nothing brings out the limitations of the bell-shaped Pearson Curves 
in a more striking manner than a discussion of their points of inflection. In 
dealing with frequency curves it is well known that any curve can be fitted to a 
given distribution and that the real problem in curve fitting is the selection of a 
curve. Figures 1, 2, and 3 illustrate three hypothetical histograms. # A11 three 
of these histograms are bell-shaped yet none of them will be closely fitted by 
any of the Pearson Curves. The reasons will be pointed out presently. 

The differential equation from which Pearson derived his system of frequency 
curves is 


dy __ y{x - P) 
dx hix 2 + b\X + 6o ’ 


By putting x — P = X y i.e. by placing the mode at the abscissa X = 0, this 
differential equation may be written: 


dy _ yX 

dX ~ dfc B 2 X db B x X + Bo 

where the + or — sign is taken according to the type of the curve. (It will be 
shown later that the constant term of the denominator must be less than zero.) 

1 



2 


RICHMOND T. ZOCH 


Since in the Type III curve B 2 is 0 and in the “Normal Curve” both B 2 and B\ 
are 0 it will be advantageous to consider the general case of 


dy __ yX 
dX “ F(X )’ 


where F(X) is an integral rational function of the n th degree, at once rather than 
considering special cases first. 


If 


dy _ yX_ 
dX F(X )’ 

then 


d 2 y __ jf 

dX* [ F(X))* 


{X 2 + F(X) 


- XF'(X)} . 



In order Jo locate the points of inflection, 


dX* 


is equated to zero. 


Then we have: 


X 2 + F(X) - XF'(X) = 0. (1) 

This equation is always of the same degree as F(X) except when F(X) is linear or 
constant. Hence we have proved the Theorem: If y = G(X) be the solution 
of the differential equation 

dy __ yX 
dX ~ Y(X) ’ 

then the number of points of inflection of y cannot exceed the degree of F(X) 
when F(X) is of degree greater than one. 

Now F(X) = B n X n + + • • ■ + B 2 X 2 + B x X + B 0 . Whence 

equation (1) can be written in the form: 

(1 - n)B n X» + (2 —n)£„-iX n_1 + (3 - n)B n - 2 X»-* + .. • 

+ (r -n)B n - r n X"- +1 + • • . - 3 B<X A - 2 B*X* + (1 - B 2 ) X 2 + B 0 = 0. 




SOME INTERESTING FEATURES OF FREQUENCY CURVES 


3 


Hence 1 we have established the Theorem: The coefficient of the linear term of 
A' in the equation of the points of inflection is zero. 




For the “Normal Curve ” and also for Type III, 

B 2 = Bz = B4 = • • • = Bn — 0. 

Hence the points of inflection of these two Types are given by X = =b\/ —B 0 . 
For Types I and II, B 2 is positive and £3 = Z? 4 = • • • = B n = 0, and the 





RICHMOND T. ZOCH 


points of inflection are X 


c - W i~ 


B 2 


Hence the points of inflection are 


undefined if B 2 = 1, arc pure imaginary if B 2 > 1, and real if B 2 < 1. 

For Types IV, V, VI and VII, B 2 is negative and B 3 = • • • = B n = 0, and the 


points of inflection are at X 


± 


]/1 +W 


In some of these Types it may happen that the abscissae of the points of 
inflection though real will lie beyond the range of the curve. Thus Types III 
and VI may have 1 or 2 points of inflection, the single point of inflection occur¬ 
ring when A/ *• > the range of the curve in the direction that the range is 

V i + b 2 

limited. Type II may have 0 or 2 points of inflection, as there will be no real 
points of inflection when B 2 1. Type I may have 0, 1 or 2 points of inflection. 
Types IV, V and VII as well as the “Normal Curve” always have 2 and only 
two points of inflection. 

Now it should be noted that when one of the eight bell-shaped Pearson curves 
has two points of inflection then the abscissae of these 2 points of inflection are 
equidistant from the abscissa of the mode. In figure 1 a point of inflection will 
be at abscissa b and another at abscissa a. (M is the abscissa of the mode.) 
Since b — M j* M — a none of the Pearson curves will fit this histogram closely. 
In figure 2, points of inflection occur at abscissae a, b y and c. Since a Pearson 
curve can have at most two points of inflection no Pearson curve will fit this 
histogram closely. In figure 3 there are four points of inflection and no Pearson 
curve will fit this histogram closely. 


Part 2. Range 

Definition: A bell-shaped curve is a continuous curve which starts at zero 
(or zero as a limit), rises to a single maximum, at which maximum point the 
first derivative is zero, and then falls to zero (or zero as a limit). 

Or, more formally, y = G{x) is a bell-shaped curve if G(x i) = G(x 2 ) = 0 and 
if G'(P) = 0 and <r"(P) < 0 where G(x ) is continuous and does not vanish in the 
interval from xi to x 2 and P is a unique point in this interval. 

If a bell-shaped curve has the value of zero at two finite points, one on each 
side of the maximum (mode), it is said to be of limited range in both directions, 
or briefly, of limited range. 

If a bell-shaped curve has the value of zero at only one finite point it is said 
to be of limited range in one direction, or also of unlimited range in one direction. 

If a bell-shaped curve has the value of zero only at ± oo , i.e. at no finite points, 
it is said to be of unlimited range in both directions, or briefly, of unlimited range. 

Theorem I: If F(x) can be separated into a finite number of factors each 
either of the form (x — r t ) or (x 2 + 2 r, x + r) + rlj where no real root is 
repeated and y = G(x) is a bell-shaped curve which is a solution of the differential 
equation 



SOME INTERESTING FEATURES OF FREQUENCY CURVES 


5 


dy _ y(x - P) 
dx F(x) ' 

then if F(x) has no real roots, y is of unlimited range in both directions; if all of 
the real roots of F(x) lie on the same side of P , y is of limited range in one (that) 
direction; if at least one real root of F(x) lies on one side of P and at least one on 
the other side, y is of limited range in both directions. 

Proof: If F(x) = 0 when x = P 9 we have 

dy = J/_ 
dx g(x) 

where g(x) = F(x) - 5 - (x — P). This derivative is zero only when y = 0 or 
g(x) = ± oo . Hence the solution does not have a finite maximum and therefore 
is not a bell-shaped curve. If F(x) > 0 when x = P, we have 


dx 2 


// 

f/'X-r )] 2 


IFW1 


which is greater than zero and, since at a maximum the second derivative must 
not be greater than zero, in this cast 1 the solution would have a minimum at 
x = P and therefore would not be a bell-shaped curve. As the theorem concerns 
only those solutions which are bell-shaped curves, F(x) < 0 when x = P. If 

Fix) = 0 when x ^ P then ^ ^ unless y is also zero. Assume y 9* 0. 

ax 

Since F(x) is negative', if y ^ 0 when P(x) = 0 then ^ — > — c© as F(x) —> 0, 

r/j: 

for an j > P, and changes to + as F(x) changes sign on passing through the 
value 0. Hence the curve would contain another maximum before falling to 
zero and therefore the solution is not a bell-shaped curve. Similar reasoning 
holds for an x < P. Therefore if y ^ 0 when F(x) ~ 0 , the curve is not bell¬ 
shaped. If y = 0 when F(x) = 0, the curve has its range limited at this point. 
That is, any real number which makes F(x) vanish will also make y vanish if y 
represents a bell-shaped curve. Hence if all of the real roots lie on the same side 
of P the curve is of limited range in that direction only, while if at least one of 
the real roots lies on each side of P the curve is of limited range in both direc¬ 
tions. If F(x) contains no real roots it does not vanish for any real value of x. 
In this case, by partial fractions the differential equation becomes: 


dy 

y 


k n dx 

(x + r ,) 2 + riS, 


+ 


k 2] dx 

(x + r 2 ) 2 + rl s 




2fc 2 i (x + r,) dx 
(* + r,) 2 + r 2 j 


2k n (x + r 2 ) dx 
(x -}- r 2 ) 2 + r 2 f 



6 


RICHMOND T. ZOCH 


On integrating, 


y = C [Or + r,) 2 + r 0l ]‘ 21 [Or + r 2 ) 2 + /•?*] 


*22 


Ml arc can - 

e r °i 


Hence y does not vanish for a finite real value of x and the Theorem is fully 
established. 

Theorem II: If F(x) can be separated into a finite number of factors each 
either of the from ( x — r ») or (x 2 + 2 r,x + r] + r 2 y ) where no real root is repeated 
and y = (j(x) is a bell-shaped curve which is a solution of the differential equation 

dy y(* - P) 

dx F(x) 

is of limited range in one direction, all of the real roots of F(x) lie on the same 
(that) side of P; if y is of limited range in both directions, at least one of the real 
roots of F(x) lies on one side of P and at least one on the other. 

Proof: By partial fractions the differential equation may be written: 


, then if y is of unlimited range, F(x) contains no real roots; if y 


k n dx 
x - r u 


_l_ f*2i 
x — r J2 


_A\>, dx 

( J ’ + ^*2l) 2 + *’0i 


, ^22 dx _ 

U + rJ 2 + rl. 


2 A' 3l (x -|- 7 * 21 ) dx ^ 2k^ 2 (x 7 * 23 ) dx ^ 

(•*’ + 7 ’ 2 i)“ + r o 1 (x + ^ 22) 2 + 7*o 2 


and on integrating: 


V = C(x - r u Y ll (x - r,*)*w • • • [Or + r 2l ) 2 + rJJ 


* 31 


*21 Jfr tan-* 

• • e r °i 


Hence y = 0 for x — r n , ru 


and for no other finite values of x provided k U} 


ki 2 , • • • are positive. If one or more of the k^ are negative, y = 00 at such 
points and unless some r tJ closer to P lias previously made y vanish, the curve 
is not bell-shaped. Therefore, for bell-shaped curves, the exponent of the factor 
containing the real root of smallest absolute value on each side of P is positive. 
Therefore: if y is of limited range in both directions, at least one real root lies on 
each side of P; if y is of unlimited range in one direction, all of the real roots lie 
on the same side of P; if y is of unlimited range it contains no real roots. Hence 
the Theorem is established. 

The effect of repeated real roots will now be considered. If a real root is 
repeated an odd number of times at x = r, then F{x) changes sign at x = r 
and the first theorem is true. If a real root is repeated an even number of times 
at x = r, then F(x) does not change sign at x = r and we know that either (a) 

dy 

y = 0 at x = r; or (b) y is finite and 5 ^ 0 and ^ = =t <x> at x = r, i.e. there is a 
point of inflection at x = r. It will now be shown that (b) cannot occur. If 
case (b) is possible, y is continuous at x = r, ~ = zb 00 according as (r — P) ^ 0 



SOME INTERESTING FEATURES OF FREQUENCY CURVES 


7 


moreover 


dy 

dx 


does not change sign in the neighborhood of the point x = r, and 


— changes sign from -f oc to — 
ax 2 

Now 


oc or vice versa according as (r — P) $ 0. 


d^u 

Whence if y is finite and ^ 0, , - does not change sign at x = r because it is 

ax 2 

possible to select a neighborhood such that 


(x 


P) 2 1 > F(x) - (* - P) -!- Fix) 


for an x differing from r by € where e is a small positive quantity. Therefore 
case (b) is not possible and y — 0 when a real root is repeated an even number of 
times. That is to say the range of the curve is limited at a point where a real 
root is repeated an even number of times. Thus Theorem I always holds for 
repeated roots. 

For Theorem II it is clear that this Theorem holds for repeated roots when a 
non-repeated root lies closer to P, and on the same side, than the repeated root. 
Suppose that the repeated root is the nearest root to P (on a given side of P). 
Then by partial fractions: 

dy _ k n dx k l2 dx dx k 4l dx k 42 dx 

y (x — rn) (x — rn)- (x — r n) :i (x - r 4 1 ) lx — r 42 ) 

, ,_ hi <*r ,_ k n Jx _ 2 k n (jc + r 2 i) d x 

(x ?' 2 i) 2 + rjj (x + r. n Y + r 2 2 (x + r 2! ) 2 + 

and on integrating: 

y = C(x - r n ) k "(x - r 41 ) <4, (-c - r 42 )* 42 • • • [(x + r 2 ,) 2 + roj* 31 

x + r 

A21 arc tu . 

... e 2 (*-r n )2 


Hence y can = 0 only for x — r u or for x = r 4 i, r 42 , • • • and for no other finite 
values of x. Since by hypothesis y is bell-shaped, then the proper k ti must be 
positive and Theorem II always holds for repeated roots. 

Theorems I and II can now be combined and generalized in the form: 

Theorem: If F(x) is a polynomial with real coefficients and y — G(x) is a 
bell-shaped curve which is a solution of the differential equation 

dy y(x - P) 




8 


RICHMOND T. ZOCH 


then the necessary and sufficient condition: that y be of unlimited range in both 
directions is that F(x) have no real roots; that y be of limited range in one direc¬ 
tion is that all of the real roots of F(x) lie on the same side of P; that y be of 
limited range in both directions is that at least one real root of F(x) lie on one 
side of P and one on the other. 

Corollary : F{x) must be negative throughout the range of y . 

Suppose now that we have some statistics which we wish to graduate and the 
statistics are of such nature that we would expect a bell-shaped curve, rather 
than a J- or U-shaped curve, and we desire the best fit: If we use a curve which 
is a solution of the differential equation 

<ly y(x -P) 
dx ~ F(x) 

(the Pearson Curves being special cases) to fit the statistics and if in computing 
the constants for the curve one of the following cases arise: 

(a) 60 < 0 when this constant is computed, 
or (b) /i 0 < 0 when the origin is moved to the mode, 
or (c) a root is located within the range of the statistics then it means that: 

1. A mistake may have been made in the computation: thus the Theorem 
just established provides a rough check on the work of computation, 

2 . If no mistake has been made in the computation it may indicate that the 
bell-shaped Pearson Curves will not closely fit the statistics and that some 
other graduation curves be used, e.g. the Gram-Charlier Types A or B might be 
tried, 

3. If no mistake has been made in the computation it may happen that one 
of the bell-shaped Pearson Curves will give an excellent fit but a different method 
than or a modification of the Method or Moments should be used in order to 
compute the constants. 

Part 3. Computing the Constants 

At present, the constants of a frequency curve are computed as follows: 
First the moments are computed about an arbitrary origin, then the moments 
about the A.M. arc determined, then & and /S 2 and the criterion are computed, 
after which the type of curve can bo selected. From this point a separate 
procedure is followed for each curve. Now in the above method one will not 
know whether a root has been located in the range of statistics or not. 

Take Pearson’s differential equation 

dy _ y($ - P) 

dx b^ 2 4 - b\X + bo ’ 

Put X — x — P. Then dX = dx and x = X + P, and 

dy _ yX _ - __ yX _ 

dx -f -f bi{X “I - P) -f- bo 62 A 2 -|- %PbzX -f- biX -f- P^fej -j- Pbi -J- bo 



SOME INTERESTING FEATURES OF FREQUENCY CURVES 


9 


Now put 


b 2 — B 2 
2 Pb 2 + bi = Pi 
P 2 b 2 + Pbi + bo = P 0 . 


Then we have 

dy _ yX 

dX B 2 X 2 + B,X + Bo 


dy __ y(x - P) _ 

dx B 2 (x — P) 2 -f B x (x — P) + Bo * 


It should be noted that for a particular curve, P 2 , Pi and P 0 are constants; 
i.c., their values do not change with a change of the origin. The values of b t 
and b 0 do change with a change in the origin. 

If we clear equation (1) of fractions, multiply by e 1 ** and integrate with respect 
to x over the range from Xi to x 2 , where 


« x,,+- * r 


J* 1 


e r,x ydx , 


then successively differentiate with respect to rj, and equate coefficients of 
like powers of ? 7 , we finally obtain: 

Xi _ p + Bi - 2 PB 2 + 2P 2 Xx = 0, 

X 2 -j~ Bo — PB\ -j- P 2 B 2 -j- BxXx — 2PB 2 \i -|- AB 2 \ 2 -f- P 2 X 1 = 0, 

( 2 ) 

X 3 -j - 2\ 2 B\ — 4 PP 2 X 2 -}- 4 P 2 X 3 -j- 4 P 2 X 1 X 2 = 0, 

X 4 -f- 3 P 1 X 3 — 6 PP 2 X 3 -j- bB^K\ -f- 6 P 2 X 2 + 6P 2 ^iX3 = 0 . 

Since we can compute the moments from the raw statistics and the semi¬ 
invariants from the moments, we may regard X 2 , X 3 and X 4 in these equations as 
knowns and the P 0 , Pi, B 2 , P and Xi as unknowns. But the origin has not yet 
been specified. Let the origin be placed at the A.M. where m = Xi = 0. As 
X 2 , X 3 , X 4 , B 0) Bi and B 2 are unchanged by a change of origin, we have: 

Pi — Pq — 2 P 0 P 2 = 0. 

X 2 -f- Bo — P 0 P 1 “j- PqB 2 -f~ 3 P 2 X 2 = 0, 

(3) 

X 3 -j- 2 P 1 X 2 — 4P 0 P 2 X 2 -f- 4 P 2 X 3 = 0, 

X 4 + 3PiX 3 - 6 P 0 P 2 X 3 + 5P 2 X 4 + 6 B 2 \\ = 0 . 


Now put 


bo == Bo — PqB\ PqB 2) 
bi = Pi — 2P0P2, 

K = P 2 ; 



10 


RICHMOND T. ZOCH 


then 


b x — Po = 0 , 

X2 + &o 365X2 = 0 , 

X 3 + 2b[ X 2 + 46jX 3 = 0, 

X 4 + 36iX 3 + 562 X 4 + 66 j X^ = 0. 


By reversing the transformation (4) we get: 

£2 = 6' , 

Bi = 6; + 2P 0 6.2 

Bo = b' + Po(b[ + PoK)^ 


( 6 ) 


Now the above theory suggests the following procedure for computing the 
constants of a frequency curve: First the moments are computed about an 
arbitrary origin, then the semi-invariants are computed (or alternatively the 
moments about the A.M., either step involves about the same amount of work), 
then the equations (5) are solved and then by means of equations (6) the B 2 ♦ 
Bi and B Q are computed. Next solve the quadratic equation 


£ 2 X 2 + B x X + B 0 = 0. 


The character of the roots of this equation indicates which type to use and it is 
unnecessary to compute the criterion. The constants of the frequency curve 
are simple functions of the roots of the above quadratic equation and can be 
readily found by integrating the diff. eq. (1) being careful to write the solution 
as a function of X = x — P. The rough checks mentioned in Part 2 can be 
quickly and conveniently applied when this procedure is followed. 

Geohqe Washington University. 



A RECONSIDERATION OF SHEPPARD’S CORRECTIONS 

By W. T. Lewis 1 

In computing the moments of a frequency distribution it is customary to find 
first what are known as the raw moments. These are obtained on the assump¬ 
tion that all the material of each class interval is concentrated at the middle 
point of the interval. It introduces what is called a grouping error because in 
fact the material does not all lie at the middle point. To compensate for this 
error W. F. Sheppard 2 derived a set of corrections. The hypothesis underlying 
his method is that the distribution may be regarded as similar to one to which 
the Euler-MacLaurin summation formula without its end terms may be applied. 
He presupposed such a curve, found its true moments, and then the raw moments 
that would be obtained if its area were concentrated at several equidistant 
abscissae. The relationship between these raw moments and the true moments 
of the curve furnished him with the corrections required for that distribution. 
If now our observed distribution may be supposed to be sufficiently like that one, 
we may use his corrections also on the observed data. One may note four points 
of criticism. 

(1) The given distribution may not be similar to the one suggested, in the 
sense that it would be close to such a curve if the intervals of grouping were 
made very small; or at all events the purpose of finding the moments may be in 
part to decide whether or not it would become such a curve, and so one would 
not like to assume that to be true at the outset. A special case of importance 
in which this last is true occurs when one is finding the moments of a sample in 
order to determine whether it may have been drawn from a presupposed universe. 
It is inexact to use raw moments but it is illogical to use corrections that have 
been proved only for the universe being tested. 

(2) Sheppard’s argument does not make use of the one certain fact that is 
given in the hypothesis, viz: that the partial area of the given distribution over 
each class interval is exactly as stated. In fact, if, following the argument of 
some authors, the given curve be assumed to be exponential, it obviously cannot 
have partial areas everywhere exactly equal to the several given frequencies, 
for in particular its partial area is not zero beyond the given range. 

(3) It is common to find distributions which do not have high contact at the 
ends of the range and for them Sheppard’s corrections certainly fail. To 
obviate this criticism new corrections have been derived by Pairman and Pear- 

1 With the assistance of Burton H. Camp. 

2 The true values are given on page 220 of “Mathematical Part of Elementary Statistics, 
by Camp, D. C. Heath and Company, 1931. 


11 



12 


W. T. LEWIS 


son for the so-called abrupt cases. These new corrections are adequate to care 
for the abrupt cases but involve so much computation that it is a fair question 
whether it would not be simpler, first to distribute the given material over each 
interval by a smoothing process, and then to find without corrections the 
moments of the smoothed distribution. 

(4) Even if one admits Sheppard’s method in general, waiving the dubious 
question as to whether it is proper to start with an assumed curve instead of 
starting with the given distribution, it is doubtful whether there are any curves 
which have exactly the properties required. The high contact hypothesis may 
be put in different language as follows: using the notation of the Handbook 3 
page 92, let f(x) be the curve and x t be the middle point of the slice. It is 
assumed that 

2 cx ( 7 (,) (x ( ) = [ x T f (, \x ) dx ; i = 0, 1 , • • • ; r = 0, 1 , • • • ; 

7 J ~°° 

c being the class interval. This means that if the moments of the curve be 
found by using mid-ordinates times class interval , instead of areas , one will obtain 
exactly the true moments of the curve, and that this will remain true for all the 
curves which are derivatives of this curve. This property is certainly not true 
of the normal curve; but it is almost true when r and the class interval are both 
small, and it is probably due to this fact that Sheppard’s corrections seem to be 
good in practice. 

Moreover, this high contact hypothesis cannot be true for any function over a 
limited range if the function is developable in Taylor’s series about one end of the 
range. For the only function which has the required properties is identically 
zero, since the function and all its derivatives are required to vanish at that end 
of the range. 

The primary purpose of this paper, therefore is to derive corrections similar to 
Sheppard’s with a different set of assumptions. The results may be used as an 
approximate substitute for both Sheppard’s and Pairman’s. That is, they will 
apply approximately to both extreme cases and to the intermediate eases; on the 
whole they give better results than Sheppard’s and are not so difficult to admin¬ 
ister as Pairman’s. 

The argument runs as follows. When a distribution is given merely by class 
intervals, there is no way of knowing exactly what the distribution would have 
been had the class intervals been smaller; we do not know that we have a sample 
from an exponential curve, and even if we did we would not know that this 
sample would lie close to the exponential in form. We shall, however, try to 
draw a graduating curve in such a manner that (a) its partial area over each class 
interval will equal the frequency of the given distribution over that interval; 
and (b) its form within each class interval will be such that it will pass smoothly 
into the adjacent portions to the right and left. A good way to do this is by a 

* H. L. Rietz, “Handbook of Math. Stat.” Houghton Mifflin Co. (1924). 



RECONSIDERATION OF SHEPPARD’S CORRECTIONS 


13 


freehand graph, frankly recognizing that there are many forms that will do 
equally well. To obtain a numerical result it is necessary to use the equation 
of some curve. Again frankly recognizing that there are many types which 
will do equally well we choose the simplest to handle: 

y = a + bt + d 2 . 

Let the relative frequency distribution be defined by f(i), — m g i ^ n, m f n , i 
being integers. To satisfy (a) we have the equation 



Fig. 1 


To satisfy (b) we shall let 

y = I[/(*') + f(i + 1)] if t = i + 

The latter will hold for all values of i from — m to n — 1 inclusive, but the end 
intervals require special treatment. Here in order to satisfy as well as possible 
both the high contact and the abrupt cases, we wish to let the material be 
distributed according to the way the curve is behaving over the two nearest 
intervals on the right (at n) or left (at — m) rather than by the addition of zero 
frequencies beyond the given limits. To do this we let the slope of the para¬ 
bolas be zero at the extremes: 



at t = — m — \ and t = n + i. 




14 


W. T. LEWIS 


Then, if for example the frequencies arc increasing as one nears the right end 
interval, the curve will rise over the right end interval; if they arc decreasing, 
it will fall. These three conditions are sufficient to determine a continuous 
curve of the sort indicated in the figure. The exact moments of the curve may 
be found by integration and expressed in terms of the* raw moments. The 
details are tedious and of an elementary nature and will be given only for the 
mean value v\. 

To determine the coefficients of the parabola y = a + bt -f- ct 2 for the rectangle 
at t = 7 we may write the following three equations; the first complying with the 
requirement that the area under the parabola from t = i — \ to t = i -f- £ equals 
the area of the rectangle at t = i, the second and third giving the ordinates at 
i — \ and i -f \ respectively: 


f(i ) = / (a + bt + ct 2 )dt , 

Ji -1 

/(0 + /(» + 1 ) 


= ° + b (i + + c (f + 2 )^ > 


/( ? ) + /0 ~ 1 ) 


' = (l + h (i — -J) + c (? — ^) 2 . 


Solving these three simultaneous equations we get for a , b, and c: 

«-«- +(? - 2 - 0 /(i+!) +(?+2 -«)*-»■ 

6 = 6 tf(t) + a - 3 t)/(i + 1 ) - a + 3 i)/(* - 1 ), 

c = — 3/(0 + + 1 ) + — 1 ) > 


and these hold for — m + 1 ^ 1 ^ n — 1. 

For the parabola y = <h + bit + over the first rectangle, i.e., where 
i — —m, we get the equations: 


/(- m) 



( ai 4- M + o P)dt, 


/(- m) +/(- rn + 1) 
2 


= a, + 1)! (- w + |) + ci (- m + J) 2 , 


hi + 2 ci (- m - =- 0 , 


and their solutions: 


«i = 4 +• m - A)/(- m + 1) - f (m 2 + m - j|)/(- «) , 

6, = (2m + 1) /(— m + 1) - f (2m + l)/(- m) , 

fi = f/( — w + 1) — f/(- m). 



RECONSIDERATION OF SHEPPARD’S CORRECTIONS 


15 


Similarly for the parabola y = a„ + b n t + through the last rectangle at 
i — n we get 


f(n) 


f n +l 

(«» 

Jn-\ 


b n t -f- c n t 2 )dt y 


fin) + fin - 1 ) 


- — a n + b n (n — 2 ) “ 1 “ c n in — ^) 2 , 


b n 2 c n n -\- c n == 0, 
and for the constants 


a n = l in 2 + n - A) fin - 1) - -J (n 2 + n - j D fin) , 
b n = - li 1 + 2 n)f{n - 1) + f (1 + 2n)/(w) , 

Cn = l fin - 1) - J/(w) . 


Having obtained the constants for the graduating curve we will determine 
the moments of this curve in terms of those of the given frequency distribution. 

n 

Notation: Let the class interval be c = 1; let v s = ^ i* f(i) be the uneor- 

t ~-—m 

rected « th moment of the given frequency distribution about the given origin; 

n 

let m* = 2 if' — v iYfii) be the uncorrected $ th moment of the given fre- 

t m 

quency distribution about its uncorrected mean; let P* be the corrected value of 
the s th moment about the given origin; and let fi s be the corrected value of 
the 8 th moment about the corrected mean. Thus v 9 and n 8 apply to the rec¬ 
tangles, and P, and ji* apply to the curves as follows: 


Cit 2 )dt 


= 


VA f t+i /•-*+* 

/ t'(a + bt + ct 2 )dt + / t\a 1 + bit + 

, =^Z+i J'-l m * 

+ / t*ia n + b n t T c n t 2 )dt y 
Jn- J 

f t + ' 

/ j I it — Pi)* ia 4- bt 4" ct 2 )dt + / it — Pi)* (<u 4" bit 4” 
J *-1 J-m-\ 

fn+ i 

+ / it - Pi)* (a n 4- b n t 4 - c n t 2 )dt . 

Jn-\ 


Cit 2 )dt 


Using these symbols we have for the first moment about the given origin: 


Pi 



tia 4 - bt 4 - ct 2 )dt 4 - 



tiai 4 - b\t -\-cd 2 )dt 


+ 



tia n 4 * b n t 4 " c n t 2 )dt 



16 


W. T. LEWIS 


■ 2 i [“+ i (*’ + ^) + ‘( i ' + 0 ] 

+ ^ - a, m + h (m 2 + -Q - Ci (m 3 + ^ J 

+ [«„ » + b n (n* + + fn (n 3 + . 

Substituting the values for the constants this becomes 

*-s;<•[(!- 3,)/(,) + (f-i->)/( 1+ D 

+ ('2 + i~ g) /(i ~ °] 


+ ( 1 ‘ + tt) [6*/(*) + (5 — 3i) f(i + 1) — (5 + 3 i ) f(i — 1)] 

+ (^ 3 + [ — 3/(*) + 2 f(i + 1) + 4/( l ~ 1)]^ 

+ {- m [| (m 2 4- m - ,V) f(- m + 1) - J (m 2 + m - ^4) /(- m)] 
+ (»« 2 + tV) li (2w + l)/(- m + 1) — f (2m 4- l)/( — m)] 

_ ^m 3 4. ^ [f/(— m + 1) - lf (- m)]j 


4- {n[i (n 2 4-ii- r V)/(n - 1 ) - f (ra 2 + n — ^)/(n)] 

4“ ^w 2 4 * ^ (1 4~ 2 n) f(n — 1 ) 4* ^ (1 4" 2n)/(n)J - 

+ (»* + 4) [f/( n - D - 4 /(»>]} • 

p i =: 2) £*/(*') 4- 24 /(*’ 4-1) — 24 A 1 ' — 4- jg/( — ”*4-1) 

- ( m + »0 ~ Jq /(« - 1) 4- n + ^JfW • 


y] -if(i) = 2 i/(0 - (- m) /(— m) - n/(n) = n 4- »»/(- w) - n/(n) • 

—m+1 —m 


^ 2 a*+ l) = 54 S. j 

t m-f 1 ”—m4- 2 


24 


- g|A- m 4 - 1 ) - w) . 



RECONSIDERATION OF SHEPPARD’S CORRECTIONS 


17 


k S = k S /0) - 24 /(n - 1} - 24 fin) 


24' 

1 1 * 
24 21 




1 


Pi = n + mf(- m) - nf(n) + - ± f (- m + l) - i/(- m) 24 

+ 24 + k ^ + i7; /(- m + 0 

- (’“ + m) “ - ]) + (“ + i'«) /w • 

ffi /<-’») +48 «“) + iJ /<-»>+ ■)->•-!)• 


Vl = 


Using this same notation and method for the higher moments we get 

2 = " 2 - k ■ p ‘ + (S' 1 + so) /( ~ m) + & + so) f{n) 


M3 = ^3 — 3vi/Z 2 ~ 


^1 


+ /(«) 


r* 

Lie 


^ ( "24 21o) ^ m + X) + ( 24 24o)^ W ^ ' 


+ 80 ” + 120 


.,/ , i\ r tw2 , w , 1 

_ +/(_W + 1) [l0 + 80 + 120. 


+y<» - o[t^- 4-ii. 


A - n-2 -4 p2 V 1 17 

M4 = v A — 4/x 3 *'i - omv l — 


+ 


/(-m) 


5m 3 21m 2 17m 

12 + ~40 + W + 


313 1 f( S 5n 3 

1680 J + /(")[_ 12 


5n 3 21n 2 17n 313 

+ 40' + 30 + 1680 


'I 3 .! 

380 J 


. tl , ..X —m* rri 2 m 1 "1 , ,, ,X —n 3 n 2 n 1 

+ /( wl + 1 )[ 12 40 — 30 336j + ^” 1} L 12 40 30 336_ * 


SPECIAL CASES 

The above formulae are rather long and in practice the special cases below 
will frequently be preferred. 

(a) We may usually take the origin at or very near the middle of the range so 
that m = n, at least approximately. 



18 


W. T. LEWIS 


If m = n: 


Pi = vi- +^/(n) +^/(-wi + 1) - ^/(w - 1). 


w = ^ - P? + (^ + ^)[/(-m) + A**)] 

+ (-2?-2lo) /( -" l + 1)+ ^- 1)] - 

ji 3 = v 3 - 3PlM2 - - P? + jjr + 2 g^* + - /(- Wl )l 

+ [l6 + 80 + 12o]^ _m + ^ ~^ n ~ ^ ' 
a - - A-- 2 - 4 M 2 Pi 17 

M4 = ^4 - 4ju 3 ^i - OMa*'! “ *1 — 2 ~~ 2" ~ 64 

+ [ \~2 ~ W ~ U “ 336] l/(_m + 1} +f{n ~ 1)1 

+ [% +■£+\ 7 <r+io!l] t/(_m) +f(n)] ■ 

(b) Except in the abrupt eases the end frequencies and the difference between 
those next to the ends will be so small (relative to unity) that they will have a 
negligible effect on the corrections. If m = n as in (a), and if also 

/( — m) = f(n) - 0 and f(—m + 1) — f(n — 1) = 0: 


*>1 = ^1. 


M2 = V2 


v\ 


12 


- + /( —m + 1) 


— m 1 
_~12 ~~ 120 _ * 


q- - V\ -3 

M3 = *3 — OV 1M2 — — V\ - 

4 

M4 — V\ 4m 3 Pi ~ 6m 2 P? — p{ 


- 2 

M2 _ 

2 2 


17 

64 


+ /(-*+ 1)[=^‘ 


m 2 w 
20 ~15 


1_ 

168 




These formulae have been written in the form which makes the computing 
simple. The following makes a comparison with Sheppard’s corrections easy. 



RECONSIDERATION OF SHEPPARD’S CORRECTIONS 


19 


Pi = Pi . 


M2 = M2 22 + /(-W + 1) 


M3 — M3 + 


M4 = M4 — 


, ' 1 (T + ^) /(-m + 1) - 


43 

192 


m 

40 


1 

560 ~ 



The following special case is also useful in comparing my formulae with 
Sheppard’s. 

(c) Let/( — m) = l f( — m + 1) and f{n) = \- f(n — 1). This produces a 
graduating curve which is exactly tangent to the f-axis at the ends of the range 
and is everywhere continuous— though it does not have continuous derivatives 
at certain isolated points. It is, however, a curve which to the eye cannot be 
distinguished from the type assumed in the Euler-MacLaurin theorem, which 
lies at the base of Sheppard’s formulae. My corrections become: 


pi = pi , 

M2 = M 2 — j2 + [/(— m?) + /Ml ) 

M3 = M3 — jt [/( —m) +/(w)] + £ ^ — ^ > 

M4 = M4 - ^ - JQ2 + I [ ( " 1 + m)2 + Vl + m + 

+ ~ rt ) 2 — v\ + n + • 

Sheppard’s are: 


pi = pi , 

1 

M2 = M2 - -J2 > 


M3 = M3 , 

- _ M2 , 7 

M4 M4 2 + 240 ‘ 

Let us compare my results with Sheppard’s in the very special case in which 
/( — m) = f(n) = 1/7, /(0) = 5/7, m = n = 1. The odd moments vanish. 
My corrections for M 2 and m are 

M2 = 0 2214, M4 = 0.1870. 



20 


W. T. LEWIS 


Sheppard’s are 

/Z 2 = 0.2024, ^4 = 0.1720. 

The numerical difference between the /2 2 ’s is 0 0190, and the numerical difference 
between the fu’ s is 0 0150. 

This example shows that Sheppard’s corrections are not valid to the precision 
to which they are usually given if they are to be used for the purpose of correcting 
raw moments. The last term in the fourth moment correction, 7/240, might 
equally well be, for example, —43/192 as in my special case. This will become 
more evident to the reader if he will draw the curve indicated in this example. 
To the eye it will appear exactly like the kind specified in the Euler-MacLaurin 
theorem; for example, much like the normal curve. Now suppose one adopted 
for the moment the point of view (which I have criticized earlier) of starting 
with the curve used in this example, breaking it up into three partial areas and 
then finding the relation between the true and the raw moments. The partial 
areas found would be exactly those used in this example and this method would 
give us Sheppard’s corrections, but they would not be exactly correct, for in this 
instance my formulae give exactly the relationship between the true and the raw 
moments. The difference is due to the fact that in this instance the assump¬ 
tions permitting the use of the Euler-MacLaurin theorem in abbreviated form 
are not justified for this curve. But there is no way of telling at the outset, if 
one has given initially only the partial areas, whether precisely this curve or 
another which to the eye would appear very much like it is truly the curve which 
will graduate the same material when subjected to a finer classification. 



THE POINT BINOMIAL AND PROBABILITY PAPER 

By Frank H. Byron 1 

1. An approximation to the sum of a number of consecutive terms of the point 
binomial may be found graphically and quite expeditiously by me&ns of so- 
called “probability paper.” This paper is ruled so that the ( x , y) graph of the 
equation of the integral of the normal curve 



is a straight line. Let the successive terms of the point binomial be represented 
as follows : 

(V + Q) n = Wo + U\ + • * • + u t + • • • + w n , (2) 

where u t — nCtP^q* and p ^ q. Then the ( x , y ) graph of the equation, 

t 

y = X) u >» t + h = *, (3) 

i=0 

i.e.j of the sum of first {t + 1) terms of this point binomial, is, in all but extreme 
cases, a set of points lying on a gently turning curve, so gently that its form may 
be represented closely by two straight lines, each passing through the median 
point as will be explained in the next section. As paper of this sort is readily 
obtainable, and as this method yields as great accuracy as is really useful in 
many problems, it is suggested that its use ought to be quite general. 

2. Sheppard’s Corrections. The formulae for the moments of the point 
binomial, mean = qn , a 2 = pqn } are exact without any corrections such as are 
used for grouped material. This fact has led us all (apparently) to assume that 
in fitting the curve to the point binomial one would get a better fit by equating 
the moments of the curve to the uncorrected moments of the point binomial 
rather than to the corrected moments. The studies made in connection with 
the preparation of this paper show that when the purpose is to equate areas to 
sums of terms the corrected moments should be used. The theoretical basis 
for this conclusion is as follows: 

To simplify the argument let us suppose that one were seeking that curve of 
Charlier type, 

F(x) = Co<t>o(x) + C]0!(x) + • • • c 4 0 i(x) , (4) 

1 With the assistance of Burton H. Camp. 

21 



22 


FRANK H. BYRON 


(where </> 0 is the normal curve and <t> 1 , 0 2 , • • its successive derivatives) whose 
integral would best fit the graph of (3). Since fitting is required only at the 
isolated points x = \, l£, 2J, • • • , it is clear that one might obtain this by the 
two following steps. First let/(a?) be any function whose integral meets exactly 
the requirement at these isolated points. What values this integral has at other 
points does not for the moment concern us. There are an infinite number of 
such f(x) curves. Next let the c’i s of (4) be so chosen that F(x) will fit f(x) as 
nearly as possible. The ordinary derivation of the c ’s supposes that the fit 
between/(.r) and F(x) is to be made by least squares, the residuals being weighted 
by the factor l/\/</>(*0* No matter what f(x) is chosen, the c's can be deter¬ 
mined so that the weighted integral of ( f(x ) — Fix))' 1 will be a minimum, but the 
value of this minimum will vary from one f(x) to another. We now desire to 
select that f(x) which will make this minimum value as small as possible, and 
it is reasonable to suppose that our best selection will be some f(x) which is as 
kindred to the nature of F(r) as possible. We shall not therefore choose an 
f(x) which oscillates wildly between the points where perfect fitting is required, 
(Fig. 1) nor yet an f(x) which is made up of the top bases of the point binomial 



histogram; we shall prefer a modification (Fig. 2) of that histogram by a smooth¬ 
ing process. Such an f(x) will not have the exact moments of the point binomial, 
but, more nearly, those moments corrected for grouping. Then the determina¬ 
tion of the c’s will come out in terms of these corrected moments, not in terms of 
the uncorrected moments. (In fact the uncorrected moments would be the 
exact moments of a nf(x) having an oscillatory character between the important 
points.) 

Of course, when n is large, the difference is too small to be noticed and the use 
of Sheppard's corrections is not worth while, and since n usually is large when 
approximations of this sort are needed, the point is not usually important. It 
was important in the computation of the tables of §4. Moreover, the use of 
Sheppard's corrections does not invariably yield better results, the gain being 
masked sometimes by other effects to be considered in §3. An excellent illus¬ 
tration of uniformly better results is in fitting (£ + J) 9 by a curve of Type 4. 
The errors in the sums as derived from (4) with and without the corrections, is 
given on the following page. 



POINT BINOMIAL AND PROBABILITY PAPER 23 


t 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

With 

Corrections 

0002 

0001 

- 0003 

- 0001 

0000 

0001 

0903! 

- 0001 

- 0002 

0000 

Without 

Corrections 

0007 

. 

0022 

0039 

0036 

0000 

- 0036 

- 0039 

- 0022 

- 0007 

- 0001 


3. The Stubby End. The other effects which mask this improvement are 
especially noticeable at the stubby end of a point binomial. We have to keep 
in mind here that the approximating curve (such as (4)), is required to turn a 
sharp comer, for, due to the least square method of fitting, it is just as important 
that it be close to zero when t is negative, as it is that it be close to u 0 , u 1} 
when t is positive. Therefore, in order to turn this corner it has to dip below the 
z-axis in the neighborhood of t = — This makes the approximating curve too 
low just to the right of t = — unless the whole curve be arbitrarily widened. 
This arbitrary widening is customarily performed by not using Sheppard's 
correction for <r, and the result is a betterment of the fit at these points but a 
corresponding loss over the rest of the infinite interval. A good example 2 is 
(I + J) 25 * The fit is worse at the left end when Sheppard's corrections are used 
but better over the rest of the interval. 

The same difficulty arises in another connection. If we compare the closeness 
of fit to a point binomial made by F(x) as written in (4) and by F(x) as it would 
be written if c 4 were zero, it often happens (as is well known) that the latter is 
actually slightly better on the average. How can this be true if the c's are 
chosen by the method of least squares and the best choice as thus indicated 
makes c 4 different from zero? The answer is that the c's are chosen so that the 
fit is best over the infinite interval, not merely over the interval from t — — i 
to t = n + i, and that furthermore the distant points are weighted more heavily 
than those near the center. Thus it might happen that a choice, other than the 
least square choice, and one in which c 4 would be zero, might be better for the 
restricted interval covered by the point binomial. This does happen especially 
when due to the abruptness of the stubby end of a very skew binomial, the 
curve has to dip below the axis in order to get by a sharp comer. A good ex¬ 
ample is the problem considered by Fry: 3 (^ -f tV) 100 - All the effects men¬ 
tioned are present here. The fit is on the average a little worse if c 4 is not equal 
to zero over the point binomial interval, a little better over the infinite interval. 

4. For graphical purposes a sufficiently good approximation to the median of 
(P + <j) n > is given by 

- M = nq — (p — q)/6. 


8 The true values are given on page 220 of Mathematical Part of Elementary Statistics, 
by Camp, D. C. Heath and Company, 1931. 

3 T. C. Fry, Probability and its Engineering Uses, p. 258, Van Nostrand, 1928. 




24 


FRANK II. BYRON 


The following tables enable 1 us to find the first quartile Q x , and the ninth decile 
Dij. The accuracy to which they can be 1 plotted is only about one-tenth that to 
which they are given here. Therefore accurate interpolation is seldom neces¬ 
sary. The 1 values of S t +\ are to be read from the graph at the points i -f- as 
indicated in the directions preceding the tables. The graphical method will be 
found efficient if one uses common sense in the computation. Numbers which 
are to be plotted should not be computed to a higher degree of accuracy than 
can be used graphically. In reading the values of it is well to remember 
that the true values lie 1 on a curve, and that outside the interval from Q x to 
they are slightly less than those given by the straight lint*. Once the graph has 



Fhs 


been made, all (lie values of S t , i can be read quickly; it is not necessary to make 
a separate computation for each t. This method is therefore specially advan¬ 
tageous when one wishes to find several sums of this sort for the same point 
binomial. It should also be noticed that one can tell from the appearance of 
the graph about how far the true sum would be from the two straight lines and 
so estimate the error to which his reading is liable. 

5. Illustration. Find the sum of the first 7 terms of (J -f~ J) 25 * 

Here t = 6, M = 8.278, Q x = 6.726, D* = 11.369. The graph shows that 

t 

52 = 0.224. The true value is 0.222. So the error is 0.002. 
o 






POINT BINOMIAL AND PROBABILITY PAPER 


25 


An idea of the accuracy of the method is given by the errors (out of two places) 
that would be obtained for this point, binomial for various values of t, as follows: 








10 

12 

14 


16 





Errors 

00 

01 

00 

00 

00 1 

00 

00 

00 




Directions for 

Use of 

THE 

Tables 

: Lo 

t. P = 

'/, M 

= 

nq 

— 

(p 

— 

<y)/6. 

!i = 

•Ti + qn, Dij 

= -r* + 

qn. 

On the 

graph draw the 

lines MQi and 

MDy. 

lead at t -J~ 2 * 

















Values 

of J*l 










2000 1000 

750 

500 

400 

300 

200 

100 


75 


50 


25 

99 

- 693 - 701 

- 704 - 

710 

- 714 - 

720 

- 728 

- 747 

— 

756 

— 

771 

— 

804 

98 

- 688 - 693 

- 696 - 

700 

- 703 - 

707 

- 714 

- 728 

- 

735 

~ 

746 

- 

770 

97 

- 685 - 690 

- 692 - 

696 

- 698 - 

701 

- 707 

- 718 

—. 

.724 

— 

734 

- 

784 

9b 

- 684 - 687 

- 689 - 

693 

- 695 - 

697 

- 702 

- 712 

_ 

718 

_ 

726 

_ 

744 

95 

- 683 - 686 

- 688 - 

691 

- 692 - 

695 

- 699 

- 708 

- 

713 

- 

721 

- 

737 

94 

- 682 - 685 

— 686 — 

689 

- 691 - 

693 

- 697 

- 705 

- 

709 

- 

717 

— 

732 

93 

- 681 - 684 

- 685 - 

688 

- 689 - 

691 

- 695 

- 703 

__ 

707 

_ 

713 

— 

727 

92 

- 681 - 683 

- 685 - 

687 

- 688 - 

690 

- 693 

- 701 

- 

704 

- 

710 

- 

723 

91 

- 680 - 683 

- 684 - 

686 

- 687 -- 

689 

- 692 

- 699 

— 

702 

— 

708 

- 

720 

90 

- 680 - 682 

- 683 -- 

685 

- 686 - 

688 

- GW 

- 697 


700 

_ 

704 

_ 

717 

88 

- 679 - 681 

- 682 - 

684 

- 685 - 

686 

- 689 

- 695 

- 

697 

- 

702 

- 

713 

85 

- 679 - 680 

- 681 -- 

682 

- 683 - 

685 

- 687 

- 691 

— 

694 

— 

698 

- 

707 

80 

- 677 - 679 

- 679 - 

681 

- 681 - 

682 

- 684 

- 688 

— 

690 

_ 

693 

_ 

700 

75 

- 677 - 678 

- 678 - 

679 

- 680 - 

681 

- 682 

- 685 

- 

686 

- 

689 

- 

694 

70 

- 676 - 677 

- 677 - 

678 

- 679 - 

679 

- 680 

- 682 

— 

683 

-- 

685 

- 

690 

65 

- 676 - 676 

- 677 - 

677 

- 677 - 

678 

- 678 

- 680 


681 


682 


686 

60 

- 675 - 67G 

— 676 — 

676 

- 676 - 

677 

- 677 

- 678 

- 

679 

- 

680 

- 

682 

50 

- 675 - 675 

- 675 - 

675 

- 675 - 

675 

- 675 

- 675 

— 

675 

— 

675 

— 

675 


* ERRATA \ 

The Annals of Mathematical Si \iistio 
Volume VI , No. 1 , March , 1935 

On page 25, in Mpons for Use,of the Tables, p - q should read p ^ 7 , Qi = *i + </» 
should read Qi gn; Z>« « $5 + qn should read = .r^ + In thc tabIes of j 

values of x under /> .97, tv * 25, instead of - .784 the number should be - .754. 




2G 




FRANK H. 

BYRON 










\ allies 

of Xj 






V 

2000 


1000 750 

500 

400 

300 

200 

100 

75 

50 

25 

v\ 












99 

1 307 

1 

318 1 325 

1 336 

1 344 1 

356 1 

U78 l 4:«) l 

481 Sot 

* Auxiliary 

98 

299 


307 311 

318 

323 

330 

343 

376 

396 

Tables 

97 

295 


301 304 

310 

314 

319 

329 

353 

367 



96 

293 


298 301 

306 

309 

313 

321 

341 

352 



.95 

292 


296 299 

303 

305 

309 

316 

332 

342 



94 

291 


295 297 

300 

303 

306 

312 

327 

335 



93 

290 


293 295 

298 

301 

304 

309 

322 

329 1 

342 1 

374 

92 

289 


292 294 

297 

299 

302 

307 

318 

325 

336 

365 

.91 

289 


292 293 

296 

298 

300 

305 

315 

321 

331 

357 

90 

288 


291 292 

295 

296 

299 

303 

313 

318 

325 

351 

88 

287 


290 291 

293 

295 

297 

300 

309 

313 

321 

341 

85 

286 


288 289 

291 

292 

294 

297 

304 

308 

314 

330 

.80 

285 


287 288 

289 

290 

291 

293 

298 

301 

306 

317 

75 

284 


285 286 

287 

288 

289 

291 

294 

297 

300 

308 

.70 

284 


285 285 

286 

286 

287 

288 

291 

293 

295 

301 

65 

283 


284 284 

285 

285 

286 

286 

288 

290 

292 

296 

.60 

283 


283 283 

284 

284 

284 

285 

286 

287 

288 

291 

.50 

282 


282 282 

282 

282 

282 

282 

282 

282 

282 

282 






Auxiliary 

Table 







\ 

p 

n 

\ 

60 

50 

40 

35 

30 

25 

20 





99 

1 525 1 

575 

1 663 1 

740 

1.871 

2 149 

3 209 





98 

416 

435 

455 

488 

520 

1 568 

1 652 





97 

381 

394 

413 

433 

445 

472 

514 





96 

362 

372 

387 

397 

410 

428 

457 





95 

350 

359 

370 

378 

389 

405 

425 





94 

336 

34^ 

359 

366 

375 

387 

405 





INEQUALITIES AMONG AVERAGES 

By Nilan Nonius 

Numerous inequalities among averages of various types are condensed in the 
monotonic character of the function 


<K0 = ‘ 

of the positives numbers jq, x 2 , • • • , x, ly not all equal each to each. For / = — 1 
this function is the harmonic mean; for t = 0 it is the geometric mean; for t = 1 
the arithmetic mean; and for t = 2 the root mean square. The relations 
among these* four means which customarily are proved by special and dis¬ 
connected methods appear easily as applications of the theorem that is 
an increasing function of t. That is, for any values of U and / 2 such that — <x 
< t\ < U < + oo, it will be true that </>(h) < Several proofs of this theo¬ 

rem have been published, many of them very complex. An extremely simple 
proof is herewith presented. 1 

That and all exist and an* continuous fur all real values of t 

may be shown by expanding each of tlu* quantities x[ in a series of powers of t 
and considering tin* remainders after each of the first three terms. Tin* ordinary 
rule for evaluating forms reducing to 0/0, which requires tlu* function under 
consideration to lx* continuous and to have at least a continuous first derivative 
for t = 0, may then lx* applied to [log <t>(t)]/t to show that 0(0) is the geometric 
mean. It is clear that </>( — *>) and </>(+ x ) an* respectively tin* least and the 
greatest of the x t . This fact and the monotomc property of make it evident 
that for each real value of t , the function may lx* regarded as an average in tin* 
usual sense that it lies within the range* of tin* observations. 

For a simple demonstration of the increasing character of 4>(t), consider the 
auxiliary function 


F(t) = t 2 


0(0 


= t 2 


d [1 


dt 1 1 



2x l log X 

v> r t 


- log 


2x‘ 

71 


It is clear that 0'(O has the same sign as F(t). The theorem will lx* proved by 
showing that the sign of F(t) is positive for all values of t except zero, when 
0'(O vanishes. 


1 Professor Harold Hotelling rendered invaluable assistance in condensing for publica¬ 
tion the material herein presented from a more extended study of generalized mean value 
functions 


27 



28 


NILAN NORRIS 


Differentiating the last expression with respect to t y one obtains upon sim¬ 
plification 

F'(t) = [(2x0 (2 x‘ log 2 x) - (2x‘ log x) 2 ]. 

By Cauchy's inequality (known as Schwarz’ inequality when applied to integrals 
instead of sums), the expression in square brackets is positive. Hence F f (t) 
has the same sign as t. Consequently F(t), since it diminishes for negative 
values of t and increases for positive values, has a minimum for t = 0. But by 
direct substitution, F( 0) = 0. It follows that F(t) and <t>\t) are positive for all 
values of t other than zero. Therefore <t>(t) is an increasing function. 

By direct general methods it is possible to show that 

<£'(0) = (rix) " - 1 - [n2(log x) 2 - (2 log x) 2 ]. 


This expression obviously vanishes only when n£(log x) 2 = (2 log x) 2 , a condition 
which is satisfied only in the trivial case when X\ = x 2 = • • • = x„. 

A proof exactly parallel to that given above may be applied to integrals or, 
more generally, to Stieltjes integrals. The monotonic increasing character of 




appears in this way if one assumes that \p(x) is a non-decreasing 


function integrable in the Riemann-Stieltjes sense, such that ^(oo) — ^(0) = 1, 

and such that / x^(x) exists for every real value of t. In terms of statistical 

Jx” o 

theory, this consideration extends the theorem from samples to populations of a 
very general character. 

Proof of the increasing character of <f>(l) has also been derived from Holder’s 
inequality, the demonstration being expressed in terms of Stieltjes integrals. 2 
The simplest general proof of the mono tonic attribute of <t>(t) heretofore published 
appears to be that of Paul L6vy. 3 As early as 1840 Bienaym6 4 presented a 
generalized form of <t>(t), namely, 



+ c 2 a'l + 

Cl -f- Ci + • 


• + c n < 

+ Cn 



and announced, without proof, its increasing character. In 1858 a proof of the 
monotonic quality of <f>(t) for special cases was published by Schlomilch. 6 Of 


2 J. Shohat, “Stieltjes Integrals in Mathematical Statistics ,” Annals of Mathematical 
Statistics (American Statistical Association, Ann Arbor, 1930), Vol. 1, No. 1, p. 84. 

* Calcul des Probabilit&s (Ganthier-Villars et Cie., Paris, 1925), pp. 157/. 

4 Jules Bienaym6, Soci&tk Philomatique de Pans , Extraits des Proc&s-Verbaux des Seances 
Pedant L’AnGe 1840 (Imprimerie D’A. Ren6 et Cie., Paris, 1841), Seance du 13 juin 1840, 

p. 68. 

5 O. Schlomilch, “Ueber Mittelgrossen verschiedener Ordnungen, Zeitschrift fur Mathe- 
matik und Physik (B. G. Teubner, Leipzig, 1858), Vol. 3, pp. 303/. 



INEQUALITIES AMONG AVERAGES 


29 


the more recent general proofs of the increasing character of <t>(t) which have 
appeared, those of Jensen, 6 Polya, 7 Jessen, 8 and Carathdodory 9 may be men¬ 
tioned. A recent application of <t>(t) to index number theory is that of Professor 
John B. Canning. 10 

Vassar College. 


15 J. L. W. V. Jensen, “Sur Les Fonetions Convexes Et Les Illegality Entrc Les Valeurs 
Moyennes,” Acta Mathemalica (Beijcrs Bokforlagsaktielbolag, Stockholm, 1905), Vol. 30, 
pp.183-185. 

7 G. P61ya and G. Szego, Aufgaben und Lehrsatze A us Dcr A?ialysis (Julius Springer, 
Berlin, 1925), Vol. I, pp. 54/. and 210. 

8 Bprge Jessen, “Bemaerkninger om koveskse Funktioner og Ulighcder imellem Middel- 
vaerdier,” Matemalisk Tidsskrift (Charles Johansens Bogtrykkeri, Copenhagen, 1931), 
No. 2,1931, pp. 26-28. 

* Attributed to Professor Constantin Carath6odory in an unpublished manuscript of 
Professor Harold Hotelling 

10 “A Theorem Concerning a Certain Family of Averages of a Certain Type of Frequency 
Distribution,” a paper presented before a joint meeting of the American Statistical Asso¬ 
ciation and the Econometric Society at Berkeley, California, June 22, 1934. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS OF SAM¬ 
PLES DRAWN FROM A SET OF INFINITE POPULATIONS 


By Hyman M. Feldman 1 

Introduction 

In the second part of his investigations, “On the Mathematical Expectation of 
Moments of Frequency Distributions/' 2 Tchouproff presented a method which 
may be interpreted as sampling from a set of infinite univariate populations. 
In the present paper this method is extended to the study of moments of product 
moments of samples drawn from a set of infinite bivariate populations. It is 
also shown how this method may be extended to populations of higher order by 
deriving some of the simpler formulae 4 for populations of three and four variables. 

Tchouproff's method has been criticised 3 because of the complicated algebra. 
On close examination it is found, however, that it is not the algebra which is 
complicated but rather the symbolism. Tchouproff introduced a great variety 
of symbols both in his derivations and in his results. As a consequence his work 
seems very intricate. If, however, the number of symbols is reduced, and the 
symbols themselves are simplified, which can be easily accomplished, the under¬ 
lying idea of Tchouproff\s method is found to be very simple. 

Quite a complete study of product moments of any bivariate population has 
been made by Joseph Pepper in his “Studies in the Theory of Sampling.” 4 His 
method is essentially an extension of Church's 5 method, in his studies of univa¬ 
riate populations, to bivariate populations. He does not, however, derive any 
generalized formulae. In the present study generalized formulae for both the 
first moment and the variance of product moments of any order arc obtained. 

It may be noted here, that all of Pepper's formulae for any infinite population 
can be obtained from those of the present study as special cases, by assuming 
that all the populations in the set are identical. 


1 A dissertation presented to the Board of Graduate Studies of Washington University in 
partial fulfilment of the requirements for the degree of Doctor of Philosophy, June 1933. 

2 Biomctrika, Vol. XXI, Dec. 1929, pp. 231-258. 

3 Church, A. E. R. “On the Means and Squared Standard Deviations of Small Samples 
from any Population/’ Biomctrika , Vol. XVIII, Nov., 1926, pp. 321-394. 

4 Biometrika y Vol. XXI, Dec. 1929, pp. 231-258. 

5 Church, A. E. R., “On the Means and Squared Standard Deviations of Small Samples 
rom any Population/’ Biometrika, Vol. XVIII, Nov., 1926, pp. 321-394. 

30 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


31 


Chapter I. Notations and Definitions 

Let (Xi, Y i), (X 2 , Y 2 ), • • • (X„, Y n ) be n bivariate populations each following 
any law of distribution whatever. The product moment of order a in X and b 
in F of the k th population will be denoted by P k a 6 . It is defined as 

P k ab = E(X k - a k ) a (F ^ - b k ) b (1.11) 

where a k = E(X k ), b k = E{Y k ), (1-12) 

and where the symbol E signifies the expected value or the mathematical expec¬ 
tation of a quantity. 

Regarding each of the n populations of the set as infinite, 6 samples of n are 
drawn, each member of a sample from one of the n populations. 7 The individual 
which is drawn from the k ih population will be denoted by (x ki y k )] and the 
product moment of order a in x and b in y f of such a sample will be denoted by 


p ab . This product moment may then be defined as 

Vab = n~ l S (x k - x) a ( y k - y) b (1.13) 

where x = n~~ l Sx k , y = n~ l Sy k . (1-14) 

The symbols a and b will now be defined by the equations 

a = n~ l Sa k , b — n~ l Sb k . (1.15) 

Obviously E(x) = E(n~ x Sx k ) = n" 1 SE(X k ) = /i" 1 Sa k = a. (1.16) 


Similarly E(y) — b. r Phat is, the mathematical expectation of the mean, of 
such a sample as was described above, is equal to the average of the means of all 
the populations. 8 

In order to make the equations as compact as possible the' following additional 
symbols will be employed: 


x k — a k = u k , x — a = u } and u k — u = U k 

y k — b k = v k) y - b = Vy and v k - v = V k 


also a k — a — A k , b k ~ b = B k . 

From the above definitions it easily follows that 


E(u k ) = E(v k ) = E(U k ) = E(V k ) = E(u) = E(v) = 0. 


(1.17) 


(1.18) 


6 The term infinite is used here in the probability sense. It is defined very dearly by 
Church in his “Means and Squared Standard Deviations of Small Samples/’ Bio7nelrika f 
Vol. XVIII, Nov., 1926, p. 322. 

7 It may be easily shown that this is equivalent to drawing a sample of n from a set of 
any finite numbeT of populations. The number drawn from each population, however, 
must be specified. See Biometrika, Vol. XIII, 1920-21, p. 295, footnote. 

8 This, of course, is a result of the Lexis Theory, for Poisson and Lexis Scries. 



32 


HYMAN M. FELDMAN 


The notation is now completed with the definition of the symbol Q XJ - by the 
equation: 

Q t] = S(a k - aY ( b k - by = SAIBI . (1.19) 

Chapter II. The Mathematical Expectation of p a i> 

The mathematical expectation of p lt b will be denoted by p a i>. In the terminol¬ 
ogy of moments this would be called the mean or first moment of the distribution 
of p ah . 


1. The Mathematical Expectation of pu. According to the above notation 
the expected value of p n is p n . By definition 

Pu = E(p n ) = En~ l S(x % - x)(y x - //), (2.11) 

and obviously En~ l S(x x — x)(y % — y) — n~ i SE(x l — x)(y x — y). 

Writing 

Xi — x = [(x, - a») - (x - a)] + [a, - a] = U % + A t 
Vi-V = [(*/> - b % ) - 0/ - b)\ + [b % -b\ = V x + B if 


equation (2.11) may be written as 

Pu = n~ l SE(U » + A t )(F» + 5 4 ) 

= n~ l SE(U x V x ) + n~ l SA X E{V X ) + n^SBiE(Ui) + n~ l SE(A % B t ). 

Since for any given population A » and are constants, it follows that 
E{A X B X ) = A t R t . Hence 

n~ l SE(A X B X ) = = n^Qn. 

Making use of (1.18), it is seen that the terms n~ l SA x E(V x ) and n~ l SB % E(U *) 
are zero. The only term left to evaluate is therefore n~ l SE(U x V x ). Since U x 
and V t are symmetric functions of the corresponding small letters, their product 
is symmetric in u x v x . There is therefore no loss in generality if attention is 
concentrated on a single subscript, say 1. 

We may therefore write 

rT l SE(U x V x ) = n~ l E(UiVi) + n~ l SE(U x V x ). 

*2 

Remembering that U x = w, — u = w» — n~ l Sui , we may write, 

U x — u x — u = u t — n _1 (ui + U 2 + • ■ • + m„) 

= n~ l [niu* — (iq +.u 2 + • • • + R »-1 + w, + i + • • • + u n )] 

* The 2 at the bottom of the S simply indicates that the summation begins with i *= 2. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


33 


where n\ = n — 1. In general, n x will denote the number n — i. Similarly 
V x = n-^nivi — (^i + i> 2 + • • • + t>*_i + v x+l + ■ • - + »„)]. 


Thus 

n~ l SE(U x V x ) = n~ z E(niUi — u 2 — • • • — w„)(wit>i — ih — • • • — v n ) 

+ n~ z SE(niu t — U\ — • • • — Wi_i — w, f i — • • • 

2 

(Rl^* — 1*1 — • * • - ».--l ~ Vi + l - • • • t’ n ) • 

When the right hand side of the last equation is expanded the only terms which 
appear are of the form E(u x v x ) and E(u x v } ). The last one must vanish for u x 
and Vj are independent and hence E{u x v,) = E(u x )E(Vj) == 0. From the last 
equation above it is easily seen that the coefficient of E(uiVi) is 

n _3 (rc?+ ni) = n~ z 7ii(n\ -f 1) = n~ 2 n\\ 

and because of the symmetry this is obviously the coefficient of any term of 
that form. Hence 


n~ l SE(U X V x ) = n~ 2 n x SE(u x v x ) . 

Since u x — x x — a x , v x — y x — b x , then 

E{u x v x ) = E(x x - a x )(y x - b x ) = E(X x - a x )(Y x - 6.) = P|, 
and in general, 

E(ulvi) = P\ t . (2.12) 

We thus get the formula 

pn == n~ 2 niSP\ j + n _1 Qn . (1) 

Now suppose all the n populations are identical. Then all the AVand also 
all the B’s vanish and therefore, Q n = 0. The formula (1) thus becomes 

Pn = n —~ Pn . (10 

n 

This is exactly Pepper's formula for p n for an infinite population. 9 
2. The Mathematical Expectation of p 2 i. By definition 


p 21 = En~ l S(x x - x) 2 (y x - y ). 


( 2 . 21 ) 


• Biometrika, Vol. XXI, p. 233, Eq. A, N = «. As was already stated in the introduc¬ 
tion, all the formulae of the present study reduce to Pepper's when the above assumption 
is made. 






34 


HYMAN M. FELDMAN 


Proceeding as above it is soon that 

En~ l S(x, — x) 5 (/y, - y) = n~ l SE{x t - x) 2 (y t — y) 

= n 'SE(U, + A,) 2 (V t + B x ) = n 'SE(U 2 t V.) + 2n-'.S£(tUM,) 

+ n- l SE(U* t B,) + n-'SE(V,A \) + 2 n~'SE(A t B t U t ) + n~'SE(A ’/?,).••• (2.22) 

It is quite evident that the two terms before the last vanish. To evaluate the 
remaining terms, we employ tin; reasoning of section 1 of this chapter and write: 

SE(U*V t ) = E{U\ Fi) + SE{U\ V t ) 

2 

= n~ z E(n*ui — Ui — • • • ) — v 2 — • • •) + n~ z SE(niU t — Mi — • • •) 

2 

(fhv t — Vi — . • • ). 

Since terms of the form E(u 2 x Vj) vanish, only the coefficient of the term E(u\v t ) 
must be found. Again considering the subscript 1, the coefficient of E(u\v i) is 
easily found from the last equation to be 


n~ z (n\ — wi) = i + l)(»i — 1) = n~ 2 n\ri 2 . 


Thus 


n~ l SE(U: V t ) 


rr 2 n\niSE{u\ v t ) = n” 2 /ii/i2<SjPJ i . 


(2.23) 


For the second term of (2.22) we have 


2 

= n~ 2 E{niU\ — — ••• )(wi*h — ?’2 — )di + n~ 2 SE(niU t — Mi — •••) 

(m^i — I*! — . - • )A». 

Tlie coefficient of E(uiih) in the first term of the right hand side of the last 

equation is n _2 /i|Ai. In the second term it is n~~ 2 SA t = — n~ 2 A\, since SA t = 0. 

2 

It therefore follows that 


2n~ 1 SE(U ,V ,A .) = 2n-*n8P\ l A ,. 

(2.24) 

Quite similarly 


n~'8E(U* t B,) = n-'ihSPloBt, 

(2.25) 

and it is obvious that 


n-'SE(A*,B,) = n-H} n . 

(2.26) 


* Note that the u which has the coefficient ti\ does not occur among the u’s which have 
the negative sign. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


35 


We thus get the formula 

p 2 \ = n~ z ?hmSP' 21 + n~ 2 n 2 S(2P i 11 A x + P' 20 E l ) + n~ x Q 2i . (2) 

3. The Mathematical Expectation of p 3 1 and p 22 . 

Pn = En~ x S(x t - x) z (y t - y) = n- x SE(x t - x)Ky » - 2/) 

= n~ l SE(U t + .4 t ) 3 (F t + 5.) = rr'S{E{U\V x + U\B X + W\V X A X 
+ W\A X B X + 3 U X V X A] + W X A\B X + 7 t ,4? + A* X B X )}. (2.31) 

The two terms before the last are zero. The last term is 

n^SE(A\B t ) = n~ x Q 31 . (2.32) 

By (2.23) and (2.24) and some slight manipulation 

3 n~ l SE(U* % A % B % + U % V % A\) 

= Sn~*n 2 S(Pl 0 A x B x + P\ X A\) + 3 n~\Q n SPU + QUSP'u) > (2.33) 

and by (2.22) 

n~ x SE{U\B t + W;V X A X ) = n~*{n\ + 1 )S(P' 30 B X + 3 P^A). (2.34) 

The only new term which is to be evaluated is SE(UlV t ). This may be 
written as follows: 

SE(U]V X ) = rr 4 SE(n lUl - u, - ... )*(mv % - v, - ... ). 

When the right hand side is expanded it is found that the only non-vanishing 
terms are of the form E(U\V t ) and E(u\ujVj), Only two subscripts, therefore, 
have to be considered. Without any loss in generality these may be taken as 
1 and 2, and the right hand side of the last equation may then be written as 
follows : 

SE(niU v — Ui — • • • y(niv t — Vi — • • • ) = E(n\Ui — u 2 — • • • y(rhVi — v 2 — • • •) 
+ E(niU 2 — Ui — • • -y(niVi — v 2 — • • •) + SE(niU l — Ui — ih - ) 3 

3 

(rtiVi — Vi — v 2 — • • • ). 

From this last expansion it is easily seen that the coefficient of E(u\v x ) is (n\ + n 2 ) 
and that of E(u 2 x UjVj) } (6 n\ + 3 n^) = 3(2 n\ + n 2 ). We thus finally obtain 

SE(U*V t ) = + nd8E(u\v x ) + 3(2n? + njSE(u\u,v}}. 

But by (2.12) E(u 3 t v x ) = P\ ly and since u t and u } and u % and v 3 are independent 
E(u 2 x u,v,) = E(u\)E{u,v,) = PJ 0 Pii. Whence 

E(U* X V X ) = + njSP^ + 3(2n* + r^SPUP’n}- (2.35) 



36 


HYMAN M. FELDMAN 


From (2.31) and the succeeding equations we finally get 
Pn = n- 6 {(nj + njSPh + 3(2n? + n,)Sr*' i0 P\ 

+ n- 4 {(n? + 1)S(PUB % + 3 P^A,)} +3n^((n* - l)S(Pi 0 A % B % + P^A]) 

+ QnSPi o + QnSPi, } + n~'Q n . (3) 

The derivation of p 22 is so similar to that of p^, that it would be mere repetition 
to go through the details again. We shall therefore merely write down the 
formula for p 22 which is 

pn = n-+{(n\ +n 1 )SP t 22 + (2n* + n 2 )S(PJ 0 P' 2 +4P; i P( l )} 

+ 2n- 4 {(n? +l)5(P; l B t + Pi s 4,)} + n-*{(nj - 1)S(PJ 0 ^+4PUA 
+ Pj 2 4?) + 0 20 SP; 2 + Q 02 SPJo + 4 QnSPU] + n~ l Q 22 . (4) 


4. The Mathematical Expectation of the General Product Moment p„ b . 
So far, formulae for the mathematical expectation of p ab , for particular values 
of a and b, have been derived. The method used in deriving these is, however, 
perfectly general, and now, that it has been sufficiently illustrated, it can be 
easily generalized. 

By definition we have 

p ah = E[n~ l S(x h - x) a {y x - y) b ]. 

Making use of the notation of Chapter I this may be written as 

npab = ES(U t + d.,) a (F, + B,) b = “s C a q C h r SK{U a ~ q V b ~ r A\B\) (2.41) 

q,r= 0 1 

where 

/la _ rib _ ^ 

9 ~~ q\(a — q) \ 1 r — r! (6 — r)! “ 

Expressing the IPs and V’s in terms of the u f s and v*& and setting a — q = Z, 
b — r = m; we may write for a particular pair of values q and r: 

n^SEiUlV^B:) = SEimu, - m - ■ • •) W - - - • .) m A«B;. (2.42) 

Consider, now, the general term in the expansion of the right hand side of 
(2.42). It is of the form : 


l\m\ 

IWIW 


(—• 1) l+m ( — nO ah ^^ h Eirii u* j 


where Tla*! = ai\a 2 ! • • • a*! 


<-Vy,\ ■ ■ ■ < k k A\B\), ( 2 . 43 ) 


* In this case, and also in the formulae that follow, whenever two or more indices 
appear in a summation, it will be understood that no two of them can have the same 
value simultaneously. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


37 


For particular sets of values ji, j 2y • • • j k , «i, <* 2 , • • • at k , and ft, ft, • • • ft, this 
term will appear in every member of the summation of the right hand side of 
(2.42), and its coefficient will differ only in the exponent of ( — n{) and in the 
subscript i of A q B r . Because of the symmetry there is no loss in generality if 
we take for j u j 2 , • • • j k , the first k integers. AVe now break up the summation 
of the right hand side of (2.42) as follows: 

SE (riith — ui — • • -) l (niv t — Vi — • • •) m A\B\ 

i 

= E(thU\ — U 2 — • • -) l (niVi — v 2 — • • • ) m A\B\ 

4 E(ti\u 2 — Mi — • • -) l (niv 2 — v\ — • * A q 2 B 2 + • • • -f- E(niv k — u\ — • ■ •)* 
(niv k - t'i - • • - ) m A\B r k + S E(niu t — uj - • • - Y 

i = A + 1 

(niv t -vi- ...) m A q t B:. (2.44) 

From (2.44) we easily get for the total coefficient (excluding the numerical 
factor) the expression 

,s (_„,)«*+'".4SBJ + s A\B r h . 

A-l A-A4 1 


Writing 


s AW, = .SAi/i;; - &ub; = Q, r - sajbj, 

* + i l l l 


the general term, (2.43), together with the total coefficient, may then be written 
as 


/ f) k 

( —TT^Thrrt i ' 8 - 1UJBJ + Q v \ E «J>2*. 

lla/i ! lift ! \h = \ ) i 

Since u x and u J} v x and v Jy and u t and v 3 an? independent while u x and v x are 
not, we have: 

I. EUuh v h = II Eui, v>, = UP k akai , 

II. Any term in which ah + ft = 1 must vanish. 

From II it follows that the maximum number of subscripts which can appear 
in any term in the expansion of (2.42), i.e. the upper limit of k, which will be 
denoted by t , cannot exceed (l + m)/ 2. In fact when l + m is even, t = (1 + m)/ 2, 
while when l + m is odd, t is the largest integer less than (l + m)/ 2. 

Making use of (2.41), the equations following it, and the reasoning of the last 
paragraph, we finally get the formula: 


n(—n) a+b pab = (a!) (6!) 


S 

1 A“1 


a 's ( ~ w) ' +r a ~ q s~ T 

<1, r “*0 <?!?*! aA”0,0 a—O 


s 


A-l 



[( — m) tt/,+ ^ 





pih 

TT 1 ahP h 

a* ! ! ' 


(5) 



38 


HYMAN M. FELDMAN 


The following restrictions on the a’s and fi's must be observed 

(a) ai + «2 + • • • -f- oik = & — q 

(b) fa + & + • • • + p k = b - r 

(c) ah + @h 7 * 1 . 

In case the n populations are identical (5) reduces as follows: For q = 0, 
r = 0, Aj = 1 , B° x = 1 , and Coo = n; while in every other case A q t B[ = 0, 
Q qr = 0. The summations with respect to q and r, therefore disappear. 
Consider now the summations 


S S ... S P n P” • • • P* . 

;i==l 1 j Jfc^l 

Since all the populations are the same we may drop the j by actually carrying 
out the indicated summations. If, then, there are c repetitions among the k 
pairs of integers a h /3h, in which a\0 u « 2 ft>, • • • are repeated l L , l 2) • • • l c 
times respectively, then we have; 


8 ••• 8 

Jl“ 1 1 k~ 1 


p Ih _ 

1 <*hPh — 


k ! C> 


h l hi 


U 


II P, 


<*h&h 


We thus arrive at the following corollary: The mathematical expectation, 
p a b y of the product moment, p a by in samples of n from a single infinite population 
having any law of distribution is given by 


n(-~n) a + b pab = 


a s {a]) (bl) 8 r 8 

Hoc* ! Wfih ! A, -1 |_4-l 




(50 


Note : In deriving these general formulae it was assumed that n > t. There 
is however, no loss in generality in this assumption. For, if l > n , we may 
suppose that, x n+ i = x n , 2 = • • • = x, =0, and hence = ... = = 0, 

and thus the above reasoning is still valid. 


5. Formulae for p 4 i, Pm } pbi, pi 2 , Pm* Formulae for p a b in which a + b = 5, 6, 
7, 8 have been obtained. But for (a + b) >6 these formulae become very long, 
and since these will be of no use in the subsequent work, only those of order 5 
and 6 are given below. 

P41 = n-« {(»»; - ndSPU + 2?mlS(2P' 30 P’ ll + 3P^PJo)) 

+ n- 6 {(»; + «i)»S(Pi 0 B. + 4 PUA,) + 6»w»S(Pj 0 BJ»J, + 2P;U.^o)i 


* This is a generalization of Pepper’s results for N = ». See Biometnka Vol. XXI, 
pp. 231-240. 

t The symbol PiiA^o is an abbreviation of the full term (A> + A,) (P 11 P 20 + 
P 11 P 20 ). Similar abbreviations will be used in the other formulae. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


39 


4- 2n~ 4 {(»* + 1) S(2P' a0 A,B l + 3PJjA?) - 2Q li SP' 30 - 3Q»SPi,} 

+ 2 n~ 3 {(n* - 1)5(2P1 1 A;3PJoA!BJ + 2QWSPI, +30^^01 + n-*Q« • (6) 
p M = n -« {(»* - n,)SPJ 2 + nnlS(P' i0 Pi 2 + 6P' il P[ l + 3Pi„Pi 2 )J 
+ n~ B {(»« - 1),S(2P;,P, + 3P; 2 A,) + 3nn 2 S(P5 0 P{ 1 P, + [P^PJi 
+ 4Pj 1 Pl 1 ].4,)} + n- 4 {(»? + 1 )S(P} c B; + GP^A.P, 

+ 3P; 2 Ai) - QuSPio - 60i,SPJx - 3Q*SPI,} + n-* {(/iI - 1)»S'(3P 23 A x B\ 
■ I GPnA,Bi + P'oiA 3 ,) -f" 3Q»SPi 0 4- GQjuSPJi 4~ QtoBPos I “I - ^ l Qz~ • (7) 

Pu = {(»* 4- nOSPj i 4- 5(n{ 4- »! + «OWoP{, 

4- 2PJxPJ,) - 10(2nJ -n 3 )SP' tl P’ 30 + 30(3«? 4- nO.SPJ.PJ.P^ } 4- »-•{(»? 
4- 1)S(PJ,B. 4- SPJxAO + 10(»J 4- l)S*[2Pi.Pi,B, 4- (2P$ 0 P{x 
4- SPJxPjoM.l - 10nn 2 <S*[2PJ o P* o B 2 + (2P; o Pi t 4- 3P.ixPi 0 )A,l} 

4- 5n~ 6 {(«! - l)S(Pi 0 A,B, + 2 PJxA;) 4- 6nn»S(P;,PJoA,B I 4- 2P‘ 0 PJ 1 ^f) 
4" QuS(Pl 0 4~ 6P 2 0P2 0) 4~ 2QwS(P' 3 1 4- 6P 2 0 P11) i 4~ 10/t -4 {(w J 
+ l)5(Pi„A!B, + Pi 1 At) - QnSPj, - <?«SPi,) 4- - 1).S(2P{ 0 A!B, 

4- Pi M 4- 2Q 31 SPio + Q-wBPi ,} 4- m-^ 5. . (8) 

p 42 = n- 7 {(n? 4- ni)SPl t + (»{ 4 - »! + b*)S(P;„PJ* 4- 8P.i,P{ t 
+ 6Pj 2 Pio) +4(2«; -«,)S(PJ 0 Pi'* + 3P 2t P 2 1) 4- «(3nJ + >h)S(Pl 0 Pl 0 P k ot 
4- 4Pi 0 P{ 1 P{ 1 )| 4- 2n 6 {(»? + lWJxB, 4- 2PJ 2 A,)4- 2(»J 4- 1)-S[(2PJ 0 P{, 
4- 3P 21 P^o)B, 4- (P£ 0 Po 2 4- 6P21P11 4-3Pi,PioM,l - 2«» 2 ,S[(2PJ 0 P 1 ', 

4- 3Pi,Pio)P ; 4-(PioPi 2 + 6Pi xP'x x + 3p; 2 Pi,M,] I 4- «- 5 {(»i - l)5(p; 0 B] 
+ SP' 31 A t B, + QPl i A 2 l ) 4-6nn 2 S[Pi 0 Pi,B? 4-4Pj 0 PixA.B, 4-(Pi 0 PJ 2 
4- 4P 1 ‘ 1 P 1 ' l )Aj] 4- QoxSCPio + 6P; 0 Pi 0 ) 4- 8Qu5(Pix + 3P; 0 P(x) 

4- 6P»S(PJ 2 4- PJoPo'i + 4PixPix)J + 4n -4 {(nJ 4- 1)S(PI,A,B; 
4-3PixA?P,4-Pi 2 Ai) -Q 12 SPJo - Sf^SPi, 4-QJo-SPi,} 
4-»-*{S(6 PJ,a;b; +8PiiAtBi 4- PJ 2 Ai) 4- Q«oSPJ, 4- SQs.SPix 
4-6Q22-SPi„} 4-n- 4 Q42. (9) 

pn — 7 {(w* 4” wi)SPJ 2 4" 3(«i 4" w* 4* w 2 )S{P 3i P\ >3 4~ 3P 22 P}x 

+ Pi»Pi„) - (2»i — n 2 ) S(Pi 0 P’ 03 4- 9Pi,Pi 2 ) 4 - 9(3mJ 4- n s )<S(PJ.P{ 1 P5 2 

•The repetition of this expression signifies that A and B factors arc coupled only with 
those P factors which have corresponding indices. 



40 


HYMAN M. FELDMAN 


+ 4PJ X P{ yP\ i) J +3»-*{(»f + l)S(Pi,£f, + Pi t A,) + c n\ + 1) 5[(PJ„PJ, 
+ 6PJ,Pit +3P; 1 Pio)P. + (Pi,Pi, +3Pi 1 PJ,)il.] 

— ww*iS[(PJ 0 Po * + 6P2 iPJ 1 + 3PJ 2P2 o)Pj + (P 0 3P2 0 + bP 12 P11 
+ SPJtPi.M,]} + 3n- 6 {(nJ - l)5(PitB; + 3Pi 2 ^,P. + P\>A\) 

+ 3»tn*S[P; 0 P{ ,B? + (P^oPos + 4PI l Pi l )A,B, + PJtPi.A?)] 

+ s[q«(pj, +3Pi 0 Pi,) + 3^11 (P2 2 + p; 0 ps. + 4p; 1 p{ 1 ) + q»(p; 3 

+ 3P; 2 P;i)]| +n-«{(»? + 1)S(PJ 0 B? + 9P' il A,B 2 +9P; 1 AM, + PJ 1 ii!) 

- S(Qb.Pi, + 9Q12P21 +9QnPi. + QaoPi*)} +3»-*{(n? - l)S(Pi 0 A,B* x 

+ ZP\ X A\B\ + Pl t A*B,) + 5(Qi,PJo + SQaPJ, + Q 3 iPi 2 )J + «~ l Q 33 . (10) 

Chapter III. The Mathematical Expectation of the Variance of p o6 

1. The Symbols and 2 .V/ /W( . Denoting the variance of ]>„b by m and 
the mathematical expectation of 2 nig^ by 2 Mn „>,, we have the definition, 

2 m„ ab = {n~'S(x, - x)"(y, - y) b - p ofc } ! 

= n~ 2 S 2 (x, - x) a ( ij, - y) b - 2 n~ l p ah S(x t - x)“(y, - y) b + p 2 ab , and 

2 M„ ab = P( 2 w,.„,,) = E{n~ 2 S 2 {x x - x)“(y, - y) b - 2n~ l p„ t ,S(x l - x) a (y, - y) h + p 2 ab \ 

= nr 2 E\S{x l - x) 2 "(y, - y) 2b ) + 2 n~ 2 E[S(x, - x)“(*, - x)°(i/, - y) b (y, - y) b ] 

- 2n~ l p ah E[S(x l - x) tt (y x - //)"] + ff nb = >r l fh, M 

+ 2n- 2 P[»S(x l - x) a (y, - »/)' > (x, - x) a (y, - y) h ] - p\. (3.11) 

Before attempting to expand the right hand side of (3.11) for any values a, b 
we shall derive the formula for 2 M/> n to illustrate the procedure. 

2. The Mathematical Expectation of 2 wph. By (3.11) we have 

2 Mp u - n~'p n -f 2 n~ 2 E[S(x, - x)(y l - y)(x, - x)(y t - y)] - p\ 1 . (3.21) 

The first term is given by (4) and the last by (1). The only new term is the 
middle one. To expand it let us write it in terms of U and V. We then have: 

w~ 2 <S2?[(xi - x){y x - y){x, - x)(y ] - y>] = n~*8E[(U x + -4i)(F, 

+ B,)(Uj + AXV, + B t )] = n- 2 {SE[U l V,U l V l + (C J x V t U,B, + U,V,U X B X ) 

+ (UiViV,Ai + U ,V jV X A i) + (U,V X A,B, + U,ViA x B x ) 

+ (UiV,A,B x + UjViA x B,) + U,U J B,B 1 + V X V,A (.4, -f 4 vanishing terms 

+ A i B x A 1 B i }\- (3.22) 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


41 


The evaluation of the last term is very simple. For 
SEiAiBAjB,) = S(A.B<A,B,), 
and from the elementary theory of symmetric functions we have: 

S(A,B,A,B,) = >S( - 4 . v g » ) . 


Hence 


SE(A t B t A,B,) = S2 . (A ' B ‘ ) S(A ‘ B2 ‘ ) = Oi l % , 

JL £ 


(3.23) 


To expand the first term and also the remaining ones, we return to the u, v , 
notation defined in Chapter I. We then write 

SE(U t V t U,V,) = n^SE[(n lUi - tn - • • • )(r h v % - v t - • - - ) 

(WiW; — Mi — • • • - »1 - •••)]• 

The only terms which can appear in the expansion of the right hand side of the 
last equation have the following form: 

tf(titt), E(u;v*), E(u Wj ) } 

i.e., exactly tliose which appear in the evaluation of p 22 . Remembering the 
symmetry, there will be no loss in generality if we take for i and j the integers 
1 and 2. To find the coefficients of the three characterstic terms, the above 
summation may be broken up as follows: 

n i SE(U t V t U ] V ] ) = E[(niUi — u 2 — ■ • -X^i — v 2 — • • -KrtiUg — ui — • • •) 

(riiv 2 — v x - •••)]+ ^{[WiMi — u 2 - ■ • -)(w>i ~ - • • •) + (^ 1^2 - Ml 

- • • 0(wii>2 - vi - • • — Mi — • • -)(wi». — t>i -•••)} + 

3 3 

- Ml - . • - t’l - • • -)(wiM, - Mi — • • *)(wi»j — »i - •••)]• (3.24) 


Writing the three terms in a row and their coefficients from the three parts of 
(3.24) in columns below these terms, we get the following scheme: 

E(u\v\) E{ll\v 2 + u\v\) E(UiViU 2 V 2 ) 

n\ n * (nj + l) 2 


n 2 {n\ + 1) 


— 2fti7i 2 


2 n\ 


rhn ,3 

2 


n 2 n,3 
~2~ 


2n2n 3 


nni(2/ii — 1) 
2 


— rm 3 
2 


Total 

coeff. 


+ n 2 — 3/q -j” 3). 



42 


HYMAN M. FELDMAN 


With the aid of the above equations we finally get: 


SE(U,V>U,V,) = n-< -- SP' 2 


nn 3 


SPl a P 


to 1 OS 


4 - n{nn\ — 3n 2 )SPi,P{ t 


Proceeding in the same way we find: 

SE(U,V l U,B 1 + U,V,UJi t ) = n~K2n] + n 2 )SP' 2l B t 
SE(U t V,V t A t + U.VjWA,) = n~\2n\ + njSP' ia A t 
SE{U x V t AjB, + U t V,A t Bd = -nn 2 SP[ + (nj + n 2 )Q u SP ), 
SE{U,V t A,B l + V,V t A t B) = 2nSP [, - Q U SP;, 

SE(U,UAB, + V.V,jM,) = nS(Pl 0 B* + P}**!) - lS(QtoP'ot + QotP'to). 

Collecting terms and simplifying we finally get: 
s Mpu = + S(PlAot + 2:?;,?!,) - nWj,)*} 

+ 2n-’« 1 {S(PJ I P, + P5s-4,)J+»- 2 fS(Pi 0 B! +2P I ‘ 1 ^ I B I + PJ I 4!)}. (11) 


Corollary 1. In case A", = Y„ i.e., when the set of populations are univariate, 
(11) becomes 

s Mr i0 = n-<{n*S[Pi 0 - (P' 10 y-) + 4SPi 0 Pj 0 } + 4n-*n, SP‘ 30 A, + in^SPUA*. 

(110 

Tliis is Tchouproff’s formula for the expected value of the variance of samples 
of n. 10 

Corollary 2. In case the n populations are identical (11) becomes 

2 M Pu = n-'nfaP* + P 20 P 02 - n 2 P\ J. (ll") n 


3. The Mathematical Expectation of 2 Mp a &- W T e now return to the general 
equation 

tMp ah = n~ 1 p 2aib - pl b + 2n~ 2 S E(x, - x) a (y t - y) b {x, - x) a (2// - y) b . (3.11) 

t-iw-l 


•Since E{u\v\) « P 2 2 > P(n?P;) * P 20 P 02 , etc. 

10 See Biometrika , Vol. XIII p. 295. * 

11 See Biometrika , Vol. XXI p. 234, Cor. 1. 




MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


43 


The first two terms are given by (5). To evaluate the last term we write: 

SE[(x< - *)•(», - y)Kx, - xHy, - y)»] = SE[(U, + A,) a (V t + B,) b (U, + A,)- 

(.V, + B,) b ) = SEiUWWW) + °' a 's b ' C*£*f h r £ b ri 

r l ,r 2 ,r 3' r 4*® 

SE(U a l V:U*V i ,A\'B:*A T l i B r l i ) = n~ 2(a+i) SE j (niu, - )*(n,p. - ■ ■-) h 

1 

(n,u,-)“(nit’,-)» + S n ^ + ^ + ^C a ri ■ ■ ■ C k ri SE[(n lU , -) a 

r l r 4 

(nit).- ) y (niu, - ■ • .)*(«it’;- ) l AVB\*A r *By, (3.31 ) 

where a = a — r h ~ a — r 2 , y = b — r$, 5 = b — r*. 


The right hand side of (3.31) has been broken up into two parts because the 
first part is symmetrical, while the second part, in general, is not except when 
ri ss r 2 , and r 3 = r 4 . 

Let us now consider the expression 

SE[(mu % - • • • ) a (rhv t - ■ • • ) b (thUj - ■ • • - • • • ) 6 ]. (3.32) 


This is a double summation in which c tJ = c ; , and in which the diagonal terms, 
c» t , are missing. 

Consider next a general term of k factors from the expansion of each bracket 
of (3.32). As we are dealing with symmetric functions, there will be no loss in 
generality if we consider the first k subscripts only; and if we let the lower limits 
of the exponents of the u 1 s and v’s begin with zero we may consider that each 
parenthesis of a given bracket contributes exactly k factors. Such a term, 
omitting the coefficient, may be written as follows: 


EWV ■ • • »;**?> • • ■ pJX' • ■ ■ u° k w\' • • • pfi) = II 


= it E ,l ( a h + &h) ( 0 A + 0 h) 


(3.33) 


This term occurs in every one of the \nn\ brackets of (3.32), having the same 
numerical coefficient in every one of them, which is 


(a i) 2 (b !)» 

! n«;! n&! ntf !' 


(3.34) 


To obtain the n\ coefficient of (3.33) we break up (3.32) into the following partial 
summations: 

El(n,u, — • • • Httir, - • • • ) b (niu, - • • ■ )“(n jVj — • ■ • )*] = E[(nm — • • • )* 
(mm - ■■■)*' (niu, - ■ • • ) a (n t V} —•••)*] + •••+ #[(«i«*-i - • • • )° 



44 


HYMAN M. FELDMAN 


0 » 1 V *.-1 — ••• ) 4 (« 1 «U — ••• ) n (riiV k - ... ) 6 1 + s — •••)*' 

(mt), - ...y s (niu, — • • • )°(n 1 n i — • • • ) & 1 -f S [E{(niU t - • • •)“ 

;■**/> -FI J i,; **A. 4 * 1 

(«itv — • • • • ) a (niv, — ... )*}] . 

From this equation we get for the total coefficient in n of the term (3.33) the 
following expression: 

S (-m) a '‘ + ‘V + ' ,k+( V + n L 5 + (_ Wl )V<] + CJ* . 

h,h’~l A-1 

The following restrictions on the a\s and /3’s must be observed. 

■ • + cn = a 

• • + a k = a 

(c) OLh -j” OL^ -j” 0h ”f" 0h 7^ 1 • 


, «i + <22 + 

(a) , 

«l+«2 + 


01 + 02 + * ’ ' + 0A = t 
(W , 

01 + 02 + • • * T" 0k =6 


From (c) we obtain the upper limit of A*, namely: £ = a + 6. 
Combining the various above equations we finally obtain: 

(*)«•+» Cf* Vj) = (a !) 2 (6 !)* S fif 

, /'" 1 \=0 


A-=l 


k 

s 

1 


S (-/u) au ^ f V + V + 72, S [(-wi)^* 4 + (-wi)VA] +c 2 


-A 

/ 


II ?&+.' 


) (fll+SJ 


Ila*! Hal ! ri/j,! IT/S; ! ‘ 


(3.35) 


Turning to the second part of (3.31) let us consider the expression 

S E[(niU, - • • • ) («,!', - ■ • • ) (n,w, — • • • ) (ihv, - ■ ■■) A^B^A^B] 4 

for a given set of r’s. The term (3.33) may also be considered as a general 
term of this last expression; of course, the exponents of the u’s and c’s will be 
different in this case. In order to evaluate the complete coefficient of a term 
like (3.33) we again write; 

SE[(n lUl -Hmr.-)>(»i Mj -)'(«*,- ) s A[ l B[ 3 A r , 2 B r ; 

= E[(mi\ - ■ ■■ )“(«iC) — • • • )>(mi« ? - • • • ) s (niv 2 — ■ ■ ■ YA^B^A^Bz*] 

+ E[niu 2 — • • • ) a (nit’« - . • • H/qiti - • • • )^(nii>, - • • • yA^B^A^Bl* 

+ • • • + E[(niu k - - ... )“(«,!>* - ... )y(n 1 u i -i - • • • ) ,i (ni^_t — • • • )* 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


45 


A?B?A[l x B r k U\+ SE[{n x u x -)-(«,»,-X’#; 3 S 

»”1 j*-Hl 

(«1 M,- y(n iV , -)M , J , J8 r /l + 5 !?[(»,»,-)*(»,»,-)* 

; ~ l 

A S (»,«,--K4 ; 1 B; s ] + S E[{n\u, -)« 

t“A-hl t, j=k 4-1 


n lVl - ) y (ntUj -)<W,- YA: i B?A r ;B r ;]. 


(3.36) 


It is now quite easy to write down the complete coefficient of a term of the 
form (3.33). The numerical coefficient of this term is the same in every bracket 
of (3.36), and is 


(- 1 )S, t (a - r,)! (a - r 2 )! (b - r,)! (6 - r 4 )! 


iitt/i! ! 


(3.37) 


The coefficient in ih and ft'*.4 ) 2 B'/ is broken up by (3.36) into the fol¬ 
lowing four parts: 

I. S (— ri\) h h ' * h * h 'A i h l B , h 3 A' h W t h i 9 from the first k(k — 1) brackets. 


k 


A~1 A'~A+1 A- 1 

[ Qrsr4_ „£ 1 ' 4; ‘ 2/i; ‘ 4 ]’ 

from the next k(n — fc) brackets. Similarly 

III. 8 {-n l ) ah '^"'A' h m r AQr^- 8 .4^*1 from the next *(«-*). 

A'=l L A *= 1 J 

And finally: 

IV. S = SA r h 'B r /sA r h *B r h 4 - *SAi ri+r2) M; 3+r4) 

A. + 1 1 1 1 

- S s S AlW r ,A- 8 AVBV S A'„‘B i* 


a I, l 


A* 1 


A '==1 


A=1 


A~1 


+ 2 s acb;* s jWbj# = Qr,„Q„,4 ~ Q(^i-f >2)034->-4) ~ Qnn SA r h *B r h < 

A —1 A'-l 1 


-QwSAI’B; 3 - S A'JBI'AIW* + 2 S S AJWI*,from the 

1 A,A'=1 A-l A1 

last cj* brackets. 



46 


HYMAN M. FELDMAN 


The restrictions on the a ’:s and 0's differ from those given above in that a is 
replaced by a — n and a — r 2 , and b by b — r 3 and b — n; and from the restric¬ 
tion (e) we get for the upper limit of k , in this case, 


a + ^ + 7 + 5 , n + r 2 + r 3 + n 

tl =- 2 '- = a + b - 2 - 

when Sr t is even, or the greatest integer less then A — when Sr t is odd. 
i ^ 

Combining (3.37) with • • • C? 4 we get for the general numerical coefficient 

in the expansion of the second part of (3.31), the expression 

_ ( —l )iSr t (fl !) 2 (hl)» _ 
nr t !lla /t !na; i !nft t !ll^!‘ 

By an obvious manipulation we have 

I + II + III + IV = 8 \(-nd° h+ * h ' a, ‘' + * h '-l^AWMVBV + Qw 

* - 1 + Qnn^ [(-m) “" + *" - 1 lit;**; 4 


s (-«.) 


!.[ 

- s yis [(-».) s avb ;* 

/* — i /j==i L J //-i 

Qr 2 r4 Q(rif r2)(rs-fr4) • (3.38) 


Finally, combining the various equations wo get the formula: 


a , b 


t M„ ah = n-'vwh - PU + 2 («)- 2( “ + ‘ +1 ' (a !)*(& !)* *8 S 8 


,k " 1 al l ,a'„ffh,p '.,-0 k ~ l 

ti h 


8 ( — ni )"‘ +f, ' ,+a *' 1 fi/i _|_ ?u . s [(—ni) a * +<l * + (—rti)”^ 3 *] + C£* 

h,h'-~l h- 1 

IIP M (a* +jO_(ft_+ft) , 2(„)-2(«+6+i) ( a T)2(51)2 $ g ( — n) Sn Sr, 

Ifa*! lift! n«; ! H/s; ! ' ' ,*-1 r^r^o Hr,! 


a.P.y.S 

s 


s { 8 K-Wi) 


«A+-0/i+« 1 

n A 


1 


-1] A r h 'B T t >A r *Ji T ' 


- s k-»,)«+» - 1] s a;*!?;* - 5 -1] 

*-l A-l 

S AJCBI' + Q,^ S - 1 ] 

ft -1 ft=~1 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


47 


4“ Qrirz S [(— 7l\) h h 1] A h 'B h ' + QnnQnn — Q(ri+r 2 ) (r34-r4) 

*-1 

A P la + a h ) (^ h ) 

iTa h \np h \na' h \np' h \' 

In ease the n populations are identical the second part of (12) must vanish, 
and in the first part the summations 


£ IT AaV+V) (0K+(i') 

1 A-l h h 


k ! Cl n P ( 


k “ * (cth+a h )((3h±t3 h ) 


li ! h ! U 


where l h k, • • • L are the number of repetitions of the pairs of integers 
(ai + «() (ft + £(), • • • (at, + <*[) (ft + Pi), respectively. 

We then have the following 

Corollary: The mathematical expectation of the variance, 2 w,, ab , of the product 
moment, p a b, in saini)les of n from a single infinite population is given by 

zM vah = p2a2b - vlb + 2(tt)“ 2(0+6+1) («!) 2 (b0 2 h 


* fc! O 

A Zi!/ 2 ! ... 4! 


5 (-nj.S [(_n,)«+» 

/i-i 


II P(ofo + a h) (Ph + ft) 

+ (-m)V^] + 1 - T —. 

r ii«*! uft i n«; i rift;! 


(120 


4. The Formula for 2 M P2l . Formula (12) can by no means be used mechan¬ 
ically. It does, however, summarize to a great extent the details in finding 
2 M Pa b for any given values a, b. Formulae for 2 M P2JL , 2 Mp 31 have been ob¬ 
tained, but the one for 2 Mv n is too long to be included in the paper, especially 
since with a little work it can be easily derived by applying (12). The one for 
2 ilip 2l is given immediately below. 

2 M„ 21 = n-'{n*nlS[P\ t - (PI J 2 ] + n\S[P\ 0 P'o 2 + 4 (P\,P[ 2 - n 2 PhPii)] 
-2nln s SPUP! i0 + (n\ + 2)S(P' 20 PloPU +SPioPhP k n) +SSPl„Pi Q P k Qi } 
+ 2n-*{r h nlS(P\ l B t + 2P' 32 A l - P^Pii#, - 2P' 21 P\ l A l ) 

- 4n^(Pl 0 BJ’i l +Pi t AJ>i 0 )-2n t 8ln t P t , l BJ>l 0 +2(2n t -3)P' ai AJ>i 1 } 

+ QnSP^Pi.A, + 4 rnSiPioPitB, + Pj 0 PJ s 4 y + PUA x P{ 2 + 2P’ 1 P' 0 B ) 

+ P\ 2 Pi 0 A,)} +n~*{nl ,- (Pj„P,)’] + 4SPJ „ P| 0 (P. + B,)* 

+ 3(n 2 + ih)SPi,A\ + 4SP' 20 Pq 2 (A, + A,y - 2n 4 -S[P 2 < oPS 2 ^: 



48 


HYMAN M. FELDMAN 


+ 2P\ l P{ 1 (Ai + AM + 16SP{ i A l Pi l A l - 4n’,S(P* 1 .4,) 2 
+ 4(2 n\ + rh)SP\ l A l B l — 4n 4 SP; ^4,-B.P^o — Sn 3 SPnP^ 0 A,Bj 
+ 8 SiPUBfioA, + Pl.yl.PjoB.) - 4nlSP[ 1 A l P' 20 B l 

— 2niitinr 1 S(Q w Pl 2 + 2QnP 3 i) + 2n 2 n“ , <S[6QuPi 1 P 20 

+ QUPioPii + 4 P\iPii)]} + 2n-*{Tm 2 S(2Pl 0 A t Bl + 2P' l2 Al + 5Pi 1 A* i B t ) 

- nMoiPhBi + P\ t A,) + 2Qn8P; 0 P, + 2P‘ 1 4,)]} + n-*{n*S[P' 0i A\ 

+ 4(P* 0 ^^B? + P[ ^B,)} - 2nSm 0 A,B t + QuA*)P \, + QnP' 20 A t B, 

4* Q 20 PJ tA *] S[Q| 0^*0 2 4~ 4Qjo(QuPJ 1 Qi 1 P 20)]} • (13) 12 

Chapter IV. The Mathematical Expectation of the Third Moment of p n 

1. The Mathematical Expectation of 3 m/> n . Following the notation of the 
last chapter we shall denote the third moment of p n about its mean by 3 mp n and 
the mathematical expectation of 3 ra,> n by 3 Afi> n . We have then by definition. 

mi>n = {n-'«S(a\ - x)(y, - y) - pnY, 

and by a well known formula we have: 

3 M Pu = p?, - 3iM,. u pn - p\ \ • (4-U) 

The last two terms of (4.11) arc given by (1) and (11). To evaluate p \! we 
write: 

p\ 1 = E{n~ l S(x t - x)(y l - y) j 3 = rr*SE(x t - x) 3 (y t - y) 3 

+ 3n~ 3 SE(x l - xp(y, - y)Kx, - x)(y, - y) 

4~ 6n~ 3 SE(x, - x)(y, - y)(x, - x){y, - y) (r k - x)(y h — y ). 

The first term is simply n~ 2 p 33 which is given by (10). The evaluation of the 
second term is not essentially different from the evaluation of the left hand side 
of (3.22), and since all details have been given there we shall omit them here. 
To evaluate the last expression let us write: 

SE(xi - x)(y, - y)(xj - x)(y ] - y)(x k - x)(y k - y) 

= SE[(Ui + AMV> + B,)(Ui + A,) (Vi + B,)(Uk + A k )(V k + B k )\ 

= SE(U,ViU,V,UkV k ) + SE(U x V t U,V,U k B k ) + • • • + SE(A,B,A,B i A k B k ). 

(4.12) 


11 In case the n populations are identical this reduces to one of Pepper’s formulae, 
Biometrika , Vol. XXI, p. 238, Cor. 1. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


49 


As there is a great deal of similarity among the various terms of the right hand 
side of (4.12), it will not be necessary to go into the details of the expansion of 
every one of them. We shall, therefore, indicate the details for the expansion 
of only two of them—one symmetrical and one non-symmctrical; and as the 
first two terms are of that type we shall use these for the purpose of illustration. 
Using the u , v notation we have 

SE(U x V t U J V J U k V k ) = n~W[(n lMl - - ){n x v x - ... )(mu, - ... ) 

(riiVj - ... )(uiu k - ... ){niu k — •••)]• 

The maximum number of subscripts appearing in any term evidently being 3, we 
can write without any loss in generality: 

SE[(niU x - • • •) ... (?hv k - ...)] = E[(nm - • • • )(niv k - • • • )(?iiu 2 - • • • ) 

Wh - • ■ • )(n l u i - • • • -•••)] + #{(wiMi - • • • ){n\V\ - • •. )[(niu 2 - •. •) 

(nm - • • • ) + (riiUs — • • • )(nii> 3 - •••)] + (niu 2 — • • • )(n^ 2 - • • • ) 

(niu 3 - • • • )(?hvz - ... ) JSCniW. - • • • )(niv t - • ■ • ) + E{(niUi - • • • ) 

4 

(»i»i — • ■ • ) + («i «2 — • • ■ ) (nit ’2 — • ■ • ) + — • • • )(niv 3 — • • • )! 

S(niU t - ■■■ )(nii\ - • • • ){n x u, - • • • )(niv, — • • • ) + SE{ («iw, - ...)••• 

4 4 

(niv k -••-)}. (4.13) 

The coefficients of the various terms arising in this expansion can now be 
found quite easily. For example, the coefficient of P\ 3 , which is, of course, the 
same as the coefficient of P x 3 3 , is easily found to be 


+1) + + ® + 

Zi O 


nitini('Sni — 2 ) 
6 


To evaluate the summation SE ( U v V , U,V,UJh) = — •••) 

(iiiVi — ) (riiu, — • ■ ■ )(itiv 1 — • • • )(niVk — ■ ■ • )B k ], we break it up into 

partial summations as follows: 

SE[(niU, - ■■■ )(mVi - ■■■ )(niu l - ... )(mv, -) {>m k - ... )B k ] 

= E {(niWi - • • • )(nit>i - • • • )[(ni?i 2 - • • • )(niVg — ■■■ )(thu 3 — ■ • • )B 3 

+ (ni«2 — • • • )Bt(niUt — ... )(mvs — •••)] + (mui — • • • )Bi(raut — • • • ) 
(n,» 2 — • • • )(niu 3 — ... )(niv 3 — • • •)) + E{(niUi - ... )(mvi — 

[(niU2 — ... )(mt> 2 - • • • ) + (wi«3 - • • • )(n iv 3 - •••)] + - • • • ) 

(n,v g - ... )(niu 3 - ... )(mv 3 - )}S(niu, - ■ ■ ■ )B,- + E{(niUi — ■ ■ •) 

4 

(niVi — )[(niu 2 — •■•)£* + (niu 3 — ■ ■ • )B 8 ] + (niUt — ... )(n t v t — ... ) 



50 


HYMAN M. FELDMAN 


[(ftiUi — • • • )Bi + (niW 3 — • • • )P 3 ] + («i m 3 — • • • )(niV 3 — ... ) 

[(niut — ■■■ )B i + (n,v 2 — • • • )B 2 ] j/S(niU,- — ■■■ )(mv, — • • • ) 

4 

+ E {(wiMi — . • • )(niVi - • • • ) + (niU 2 — • • • ) (mo 2 - ■ ■ ■ ) + (niu 3 - ■■■) 
(mv 3 — • • •)} S(niu, — ■■■ )(niv, — ■ ■ ■ )(mu, — ■ ■ ■ )B, + E{(niUi — ■ ■ - )Bj 

4 

4- ( n 2 Ui - ■ • - )Bi + (niu 3 - ■ ■ ■ )B 3 }S(niU l - ... )(niv x - • • • ) 

4 

(niu, — • • •) (niv, — ...) + ES(niU, — ■■ ■ )(n l v, — • • • )(niu, — • • •) 

4 

(mvj - • • • )(niu k — ••*)£*. (4.14) 


The expansion of (4.14) is not as difficult as it appears for only two subscripts 
can appear in any term: the explicit appearance of the subscript 3 is due to the 
fact that we arc dealing with a triple summation. We, consequently, do not 
need to expand those parentheses in which B appears. 

We shall now, without any further details, state the final result, which is: 

,M, n = n~«{S[n\P' 33 - PUPl 3 + Sn/P^PJ, + P^P’J + 3n,(n? + 2)PJ 2 P{ 1 

- 3(2n 2 + 1 )P' 2> P[ 2 + 3n 3 P; i P„' 2 P‘ 0 + 6(»J + 3n x - 2)P| l P{ l P* 1 ] 


- Sn^lJSCnJPi, + P5 oPo 2 - n 2 (PIi)’ + 2PIAJ0] - nJCSPj,) 3 ]) 

+ 3»-*{Sl»J(P*A + P$A) + 2a(PJ,Pi A + PIAJA) 

- 2 n x (P{ 1 pi a + pjam.) - 2 » 1 (p{.pj a 4 - p;,pm.) 

4- (PfAiA 4- P* Ai A) - 2«,(PjjPJA + PJAo’A) 

4- (pjAo'A + PoAiAW + ^{^(PJA! + p; 3 4?) 

4- n,(PiAI A? + PJAM!) - (PJ Ai«A! 4- PJAIA!) 

- 2(Pi 0 B i P’ u B, 4- Pj AP/A) 4- 2n 1 P’^A - 2P' 20 B,Pt >i A, 

- 2P' 11 A x P’ l B, + 2n 3 P\ x P’ u A x B x - 2{P[ l YA x B l \} + n"*{S[(PiA! + P' 03 A 3 X ) 

4-3 (P^Atf + PUA'B,)]}. (14) 13 

Where a = n\ + «i 4- 1. 

This formula is shorter and simpler than the formula for 2 M;j 21 , although they 
are of the same order. This is due to the symmetry of 3 Mj. u . 


Chapter V. Product Moments of Trivariate and Quadrivariate Populations 

1. Some additional definitions and notation. In this chapter we shall indicate 
briefly how the method of the previous chapters may be extended to populations 


11 Cf. Biometrika, Vol. XXI, p. 253, formula (19). 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS 


51 


of more than two variables. We shall do this by deriving some of the simpler 
formulae, corresponding to those of Chapter II, for trivariate and quadrivariate 
populations. 

The notation will be slightly changed in that we shall symbolize the new 
variables by priming the symbols for the variables used in the previous chapters. 
Thus, we shall indicate the k {h trivariate population by (X k , Y k , X [) and the 
A; th * quadrivariate population by (A r *, Y k , X k , F A .), and samples from such 
populations by (x*, y k1 x' k ) and (x k1 y k) x' k , y[) respectively. 

We shall denote by P™ jk the product moment of the m ih population of order 
i in X,j in Y, and k in X' 9 and by P™, kl the similar product moment for a quadri¬ 
variate population. These are defined by the following equations: 

Plk = m m - a.J'(Y m - - cj\ (5.11) 

r: ikl = E(X m - aJ(Y m - bJ’iXL - c m Y(Y m - d m Y (5.12) 

where a m , 6 m , etc. are defined as in Chapter I part 2. 

The sample product moments corresponding to P m ljk) P" l jU will be denoted 
by p X] k and p XJ ki respectively. Tlicy are defined by: 

V„k = n~ l S 0r m - r)'(y m - y)‘{x' m - x'f , (5.13) 

p„ki - - x)'(Um - y)’(xL -x'YiyL - y'Y ■ (5.14) 

m— 1 

Finally we shall designate E(p tjk ) and E(p tjU ) by p tl k and p ljkt respectively. 

2. The Mathematical Expectation of p m and p 2 u. By definition we have 

pm = E[n~ l S(x, — x)(y, - y)(x' t - x')l- (5.21) 

Applying the transformations (1.17) this equation becomes 

np m = imU % + A t )(V t + B t )(Ui + C t )] = 8E(U % V t U[) + SE(U t V t C x ) 

+ SE(U t UiB t ) + SEiV.UiA,) + vanishing terms + SE{A v B x CX (5.22) 

Since EA X B X C X = A X B X C X} SE(A t B x C x ) = SA x B t C x . Following the previous 
notation we shall put SA t B t C x = Qm- 

When the expression SE(U t V t U[) is expanded, no other non-vanishing terms 
except those of the form E(u t v t u[) = P J n can appear. The coefficient of this 
term will evidently be the same as that of P 21 in (2.23), namely: n~ 2 n\n 2 . 
Whence: 

SE(U x V t U[) = n-'n^SPU. 

The three terms following the first of (5.22) are by (2.24) equal to 

n- 1 n l S(PU,C t + PUiB t + P i Q11 Aj. 



52 


HYMAN M. FELDMAN 


We thus get: 

Pm = n * n \ ^2<SPJ, 1 + n~~ 2 n2S(P\ Ui C t + Pioi-^* + ^011 -^i) 4* n ~ l Qm • (15) 
With the aid of the formulae of II, 3 we easily find the formula 
P 212 ** n 6 {(^i 1 )SPJu + (2n? + n 2 )S(P l 2 ooSPqh + 2P} 10 aSPI 0 i — P 2 00^0 1 1 

- 2P; io P{oi)} + + l)S(PioA + PJioC. + 2Pi n A,)} 

+ n“ 3 {(n? - 1 )S(PJ„A? + 2P1 0 ^A + 2PJ 10 A t C t + Pj 0 o^) + <2,ooSPSii 
4" 2Qno>SPi 01 + 2Q m SP 110 + QoiiSP 20 o) ! 4~ n 1 ^2ii • (lfi) 


3. The Mathematical Expectation of p nu . The procedure for finding the 
formula for p U n is very similar to the above. We shall therefore merely state 
the result. 

Pirn — n 5 {(^i — DSP mi + (2ft i + ^ 2 )MPl HK^Poou + ^looi^ouo 
+ PlouPliod} + + 1)*S(P; 110 A + PIioiC, + PJoiA + 

+ rrH(nl + 1)S(P[ 10 qC % D % + P{ou>B % D t + P' 0UQ A x D t + P' 0l0l A % C % 


+ P M\\A t B x + P \QoiB t C % + SCQoonP 1100 4~ Qoioi^ioio + QlOOl^OllO 4" QlOlO^Ol 


4- QnooP0 011 


i) 1 4- n 1 Qi) 


Washington University, St. Louis. 



AN APPLICATION OF ORTHOGONALIZATION PROCESS TO THE 
THEORY OF LEAST SQUARES 

By Y. K. Wong 


Introduction 


The present paper is an outgrowth of the writer’s attempt to fill a lacuna in the 
discussion of the Gauss method of substitution as given by many writers. For 
illustration, let us cite Brunt’s Combination of Observations. In Chapter VI, 
we find: 

Let the normal equations be 


[aa]x + [ab]y + [ac]z — [al] = 0 
[bb]y + [bc]z - [bl] = 0 
[cc]z — [cl] = 0 

From this equation we find 

_ [«?>] [<w.'l [al] 

[««] ' [art] [aa]' 

Substituting, we obtain 

[661]// + [bc\]z - [Ml] = 0 
[ecl]z - [ell] = 0 


where 


\ [661] = [66] — [a6] [ab]/[aa] f etc. 

From the first equation in (iii), 

„ _ Ibel] , , [611] 

V ~ [661] + [661] • 


(i) 


(ii) 


(iii) 

(iv) 


(v) 


In connection with equations (ii) and (v), the question naturally arises as to 
whether or not these numbers [aa], [661], • • • are all different from zero. Since 
[aa] = 2a t a t , one can see that [aa] ^ 0 if a t ^ 0 for every i . However, to show 
the non-vanishing of [66.1], [cc.2], etc. is by no means simple. Many writers do 
not give a demonstration on this point. We know that a system of non-homo- 
geneous linear equations has a solution if the system of equations is linearly 
independent. Brunt gives a discussion of the independence of the normal equa¬ 
tions in Chapter V, Art. 36, but he does not state clearly a condition for inde¬ 
pendence. He says: “The condition of independence is in general satisfied in 

53 



54 


Y. K. WONG 


the problems which arise in practice. We can then proceed to the formation 
and solution of the normal equations.” It is one of the aims of this paper to 
give a necessary and sufficient condition for the independence of the normal 
equations and to show [aa], [b&.l], etc. are all different from zero when the condi¬ 
tion is satisfied. 

In the theory of least squares, there is the classical method of the derivation of 
normal equations by an application of the notion of minimum in differential 
calculus. After the normal equat ions are secured, the Gauss method of substi¬ 
tution is applied to obtain the solution. Doolittle modifies the Gauss method of 
substitution so as to facilitate the labor of computation. However, when the 
number of parameters (or unknowns) exceeds 4, Doolittle’s method is quite 
complicated. In the present paper the writer wishes to present a mathematical 
discussion of a method obtained through an application of the Gram-Schmidt 
orthogonalization process. This method furnishes us a new procedure for deter¬ 
mining the most probable values of the parameters (or unknowns). The formu¬ 
lation of the system of normal equations will be omitted in this new procedure, 
which is particularly effective in fitting curves to time series. The paper can 
be roughly divided into three parts. The first part gives an algebraic derivation 
of the normal equations. The second part derives a condition for a set of 
observation data so that the Gauss method of substitution is applicable. The 
third part gives a relationship between the 1 Gauss method of substitution and the 
orthogonalization process. A practical application of the results of this paper 
will be found in a later paper. 

The process of orthogonalization has been used in the 19th century, and has 
been applied extensively in the theory of integral equations and linear trans¬ 
formations in Hilbert space. In classical analysis, if y? 2 0r), • • • , defined 

on (0, 1), is a normally orthogonalized system, and if /(x), defined on (0, 1), is 
such that/ 2 is Lcbcsgue integrable, then the system of Fourier coefficients 

fr = J o fi,x)<p r {x)dx (r = 1,2, ■■■) 

has certain interesting properties, one of which is that 

— f (fix) - y] fr<pr )* dx = 0 . 

m Jo 

The preceding notion has a close connection with the theory of least squares as 
outlined in many texts on statistics. In section III, the reader will find how 
this notion is applied in the derivation of the normal equations. Since the 
number of dimensions is finite, the integration process reduces to a summation 
process and furthermore no limiting process is used. This new derivation of 
normal equations has the advantage that (1) differential calculus is not used, 
(2) a new form of normal equations is obtained, (3) the solution of the unknowns 
or parameters can be immediately obtained without further application of the 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


55 


Gauss Method of Substitution or the Doolittle Method, and (4) the formula for 
the “quadratic residual” is obtained as a simple corollary. 

From the results in section IIJ, we see immediately what condition should be 
imposed upon the set of observation data so that the Gauss method of substitu¬ 
tion may be applicable. In section VI, we find a necessary and sufficient condi¬ 
tion for the independence of the system of normal equations (3.9), and also the 
fact that when this condition is fulfilled, then, due to the special nature of the 
coefficients of the unknowns, we see that the matrix is properly positive. It is 
on account of this fact that we are able to show that the numbers [aa], [65.1], etc. 
are all different from zero. The demonstration of this point is found in section 
VII. In this section, we lay down a fundamental hypothesis for Gauss’s method 
of substitution, namely, the set of observations A x = (a»i, • • • , n ul ) i — 1, 
2, • • • , r, is linearly independent. Lemma 7.3 may be called the fundamental 
lemma for Gauss’s method of substitution. Some interesting properties of the 
numbers [A„A r h], where ,<?, t = 1, • • • , r, and h is less than the smaller one of 
($, t) y are demonstrated. 

From the properties of the numbers [A s A r h] } where s, t = 1, • • • , r and h is 
less than the smaller one of (.s, /), and in comparison of the system of equa¬ 
tions (3.7°) with the final form of equations obtained through the application 
of the Gauss method of substitution, we can see the relationship between the 
Gauss method and the Gram-Schmidt orthogonalization process. If we should 
like to give some credit to Gauss, we may say that the orthogonalization proc¬ 
ess was known by him, but was stated in a different form. 

The writer wishes to remark that certain theorems together with proofs in 
section II, IV, V and VI are obtained from E. H. Moore’s lecture notes. How¬ 
ever the writer should be responsible for any defect. Finally, I should empha¬ 
size that the use of the notion of positive matrices is only for convenience. 

I. Vectors, Inner Products, and Linear Independence 

In this paper, we shall consider vectors of the form 1 

(1.10) (t>i, v 2} . • • , v n ). 

For convenience, we shall use capital letters to denote vectors of the type 

(1.10) . 

Let V = (vi, v 2 , • • • , v n ) and U — (ui, u 2) • • • , u n ), then we say V = U if 
v t = Ui for every i. 

We define V + U by 

(1.11) V -J- U = (fi + Ml, V 2 + U 2) • • • , v n -j- U n ) , 
and sV, where s is a number, by 

(1.12) SV = (SV i, SV 2 , • • • , SV n ) . 

1 If we write v t as v(i), where i = 1, 2, • • • , n, then v(i) may be considered as a function 
of one variable whose range consists of a set of positive integers, (1, 2, • • • , n). E. H. 
Moore defines a vector as a function of one variable. 



56 


Y. K. WONG 


Hence, sV = Vs. In particular, when s = —1. we shall put — F = ( — 1)F. 
Then 1/ - V becomes a special instance of (1.11) and (1.12). 

From (1.11) and (1.12), we see that addition is commutative and associative. 
Inner Products: The inner product of two vectors V = (vi, • • • , v n ) and 
U = (u\ y • • • , u n ) is defined 2 to be 

(1.2) (7, U) = 

1 

The norm of a vector V is defined by n(V) — { F, F); and the modulus of a 
vector F is defined by mod (F) = + \/ n{V) . 

From (1.11), (1.12), and (1.2), we can easily prove the following theorem: 
Theorem 1. The symbol { , ) has the following properties: 

( S) (I/, F) = (F, lJ) for every F, f/; (symmetric property) 

(L,) («F, U) = *(F, I/) = (F, «{/) /or even/ F, U and every number s; 

(L + ) (17, (F + IF)) = (17, F) + (17, W) for every U, F, IF; {linear property) 
(P) (F, F) §: 0 /or every V; {positive property) 

(P 0 ) (F, F) = 0 ?/ and only if V is a zero vector] {properly positive property) 
Linear Independence. A set of vectors Fi, • • • , F r is said to be linearly 
dependent in case there exist constants a, • • • , c r not all equal to 0 such that 

CiVi -p * * • -f- C r Vr — 0 , 

where 0 is a zero vector. 

A set of vectors Fi, • • , F r is said to be linearly independent in case, if the 
constants ci, • • • , c r satisfy 


CiVi + * • • “f* e r F r = 0, 

each constant c x = 0. 

Theorem 2. If the set Fi, • • , F r is linearly independent , then none of the 
vectors is a zero vector , and hence the norm of every vector must be different from zero. 

For if V 8 is a zero vector, then set c 8 = 1, and c x = 0 for i ^ s. It is obvious 
that 

O Fi + ... + O F, i + I F, + 0.F.+ 1 + ... + 0*F r = 0, 

which show that the set of vectors Fi, • • * , V r is linearly dependent, contra¬ 
dictory to the hypothesis. 

A more general theorem is stated in 

Theorem 3. If the set Fi, • * * , F r is linearly independent , /Aen ever?/ subset 3 
z.s aZso linearly independent. 

Wo shall prove this theorem by a contrapositive form. The contrapositive 
form is as follows: If in the set Fi, • * • , F r , //?ere exists a subset which is linearly 

2 The notation ( , ) was introduced by D Hilbert. In treatises on least squares, the 
notation [ 1 is used. The present writer reserves the latter notation for other purposes. 

3 Consider a set of integers (1,2, • • • , n) Then any combination of this set of n distinct 
integers taken r ^ n at a time is called a subset of the set (1,2, • ■ • , n). Likewise, we call 
any combination of the set of vectors Vi, V 2 , * • * , V n taken r | n at a time a subset of the 
whole set. 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 57 

dependent, then the whole set is also linearly dependent. Without losing any 
generality, let us suppose the subset V Jf • • • , V 8 (s ^ r) to be linearly depend¬ 
ent. Then there exist cj, • • • , c 8 such that 

c{V\ -f- • • • -j- c a Va = 0 . 

If $ = r, then the whole set is linearly dependent. If s < r, then let c x = 0 
for i = s — 1, s — 2, • • • , r. Then 

E c,v % = o, 

1 

which shows the whole set is linearly dependent . 

Theorem 4. 4 A necessary and sufficient condition for the set V x = (p,i, • • • , v tn ), 
i = 1, • • • , r to be linearly independent is that there exists a non-vanishing deter¬ 
minant of order r in the array 

Pll, v l2 , • • • ; Pin 

P21, ^22) * * • ) P2n 


Prl, Pr2? * ' ' j Prn 

II. Gram-Schmidt’s Orthogonalization Process 

For the present section and the sequel, we shall adopt the notation A t = 
(a,i, • • • , On,), Bi = (5,i, • • • , 5,„), and C, = (c,i, • • • , c. n ) for t = I, 2, • • • , r. 

Theorem 5. For every set of vectors Aj, ♦ • ■ , A r , there exists uniquely a set of 
vectors B 1 , • • • , B r such that 

5.1) (. B t , B.) - 0 (t* s). 

5.2) For every t satisfying the relation 1 ^ t < r, then A t is a linear combina¬ 
tion of Bi, • • • , B t ) and B t is a linear combination of A 1 , • • • , A t . 

5.3) Bi = Ai) and for t > 1, (B t — A t ) is a linear combination of 
Bi, • • • , B t -. 1 , and is also a linear combination of A 1 , • • • , A t -\. 

5.4) If t > 1, then (A a , B t ) = 0 for every s < l. 

5.5) (A t , B t ) = (B t , B t ) = (B t , A t ) for every t. 

To prove this theorem, let us define 

Bi = Ai, 

B 2 = A 2 if n(Bf) = 0 

(2-D if nW *° 

B, = A, - E h lt B ( (1 ^ t S r ), 


1 See Dickson, Modern Algebraic Theories , p. 55; Bochcr, Higher Algebra , p. 36. 



58 


Y. K. WONG 


whore 

( 2 . 11 ) 


hi, = (A,, B,)/n(B,) , if ti(B,) 0 , 


= 0, 


if n{B,) = 0 . 


We proceed to show that this set has the properties stated in the theorem. 

To prove 5.1), let us suppose t < s. This assumption is permissible since the 
operator ( , ) has the symmetric property. First, if Ay = 0, then B, = 0, and 


(B u B t ) = (Ay, A 2 ) = (0, A t ) = 0. 


Secondly, if .4 1 ^ 0, then B i ^ 0 and 


(By, Bt) = (Ay, At - K By) = (Ay, At) - (Ay, By) 


(A 2 , By) 
n(By) 


= (Ay, At) - (Ay, Ay) (At, Ay)/n(Ay) = 0 . 


Assume 5.1) is true for t = s — 1, then 

(B„ B.) = (B t , A. - E h.,B = (B„ A.) - E *«(&, £.) • 

The sum on the right hand side reduces to h al (B t , B t ), since the other terms 
vanish by assumption. Now if (B h B t ) ^ 0 then by (2.11), h 8l (B t , B t ) = B t ), 
and by the symmetric property of ( , ), we obtain 


(B t) B a ) = (B t , A a ) - (A a , B t ) = 0 . 


If (B t , B t ) = 0, then by the Po-property of ( , ), we find that B t is a zero 
vector, and hence (B t , B a ) — 0. 

5.2) follows from the definition of B t . 

That (A t — B t ) is a linear combination of B x , • • • , B t - X for t > 1 follows 
from the definition of B t . Since B a is a linear combination of (A\, • • • , /l,-i), 
we secure the second part of 5.3). 

s 

By 5.2), we can determine g 9X such that A a = ^ 0 ^B t . Thus for every 

i 

s < t, we have by 5.1) 

(A„ B,) = (e b!) = E B t ) = .0 

t - 1 

By 5.3), there exist g tl such that A t — B t — ^ gr*,#, and hence A t = B t + 

i 

t-i 

Thus by 5.1), we have 

i 

U«, fi<) = + E = (B„ By) + E gu(B„ By) 

= (By, By) . 

By the symmetric property of ( , ), we secure ( A t , B t ) = ( B t , SO- 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


59 


For the proof of uniqueness, let us suppose there exists a second set of vectors 
B[, • • • , B' r having the properties 5.1), 5.2), 5.3), 5.4), and 5.5). By 5.3), we 
see that Bi = Ay = B[. Assuming the uniqueness holds true for r = t, we 
proceed to show that it is also true for r = t + 1. By 5.3) there exist con¬ 
stants s % , s[ (i — 1, • • • , t) such that 

t 

#<+i = + 22 s *A t 

B t +1 = A t + 1+ 22 

i 

Thus 

B n 1 - B' t+l = 22 U - O^t.- 
1 

From this, we secure 

(ft+i - /*'«+„ »mi - *',+i) = - «'«u, E (*. - 

= E («. - »',)•(£,+1 - J?MI, a,) = o, 

l 

by virtue of 5.4). Hence by /Vproperty of ( , ), we have B in — /?' M = 0 
and hence B t +1 = 

The set B\, • • • , B r with the properties stated in Theorem 5 is called the 
orthogonalized set of Ay, • • ■ , A r . This process is called Gram-Schmidt/s orthog- 
onalization process. 

The set By, • • • , B r is called the normally orthogonalized set of A\, • • ■ , A r in 
case the former set enjoys the properties 5.1), 5.2), 5.3), 5.4), and if 

5.5n) (A t , B t ) = (B h B t ) = (B t , A t ) = 1 for every t. 

Theorem 6. If a subset A ki , • • • , A ktn (l ^ ki S • • • ^ k m ^ r) in the set 
Ai, • • • , A rj is linearly independent, then there is a subset Bi v ■ • • , B km which 
has the properties stated in Theorem 5, and it is also Itnearlif independent. 

Let h — k m — k\ -f- 1. To prove the theorem, we may assume k\, • • • , k m to 
be 1, • * • , h ^ r, for otherwise, we may renumber the vectors. We construct 
the B vectors in the same way as given in equation (2.1) and (2.11). By 
Theorem 5, we have 

(2.2) Bi = Ai, B . = A, + E !7.A (» = 2 , ■ ■ ■ , h). 

I 

Suppose the constants cy, • • • , c k be such that 


C\B\ + • • • + ChB k = 0 . 



60 


Y. K. WONG 


Then by (2.2), wc secure 

h h / 9-1 

0 — ciAi + 22 c a B , = ciAi H~ 22 22 0*iAi 

2 2 \ 1 

= (ci A~ C‘2021, + • • * + C 'hQh\) A\ 4“ (c*2 + C 3 0 3 2 + • • • C hQlti) A 2 

Since A h • • • , A h are linearly independent, we have 


ci - c-jfh i — C/,0/ a = 0 , 


(2.3) 


C‘2 — • • * — C/,0/,2 = 0 , 


+ C/,.4 A 


C/, = 0 . 

But the determinant of the coefficients of r t (i = 1, • • • , h) is 

1 #21 031 * • * 0//1 
0 1 032 • • • 0 / 12 


0 0 0 1 


Hence by a theorem in the theory of equations, 5 the only solution that satisfies 
(2.3) is that ki = = • • • = &/» = 0. Thus the subset B h • • • , B h is linearly 

independent. 

Corollary. The orthogonalized set B i, • • • , /.s linearly independent if and 

only if the set A\, • • • , A r is linearly independent. 

Theorem 7a. If a set of vectors A\ } • • • , A r is linearly independent , then the 
set can be normally orthogonalized. 

Let B x be the orthogonalized set of A x . Since A t is a linearly independent set, 
then the set B x is also linearly independent by Theorem 6. Hence by Theorem 
2, the norm of e\ery vector B t is non-vanishing. Define C t — W,/mod (B ,). 
Then this set C x enjoys the properties 5.1), 5.2), 5.3), 5.4) and 5.5n). 

Theorem 7b. If a set of vectors , Vi, • • • , V r is normally orthogonal , i.e. if 


(2.4) 


(F„ V.) 


1 1 (i = i) 
jo (i 5^ j), 


then Vi, • • • , V r is linearly independent. 

For suppose 

C\V \ + • • • + CrV r — 0 . 

Then 


E c.(v„ vi) = 0 > 


(j = 1,2, ... . r) 


6 Dickson, First Course in the Theory of Equations (1922), p. 119. 



ORTHOCONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


61 


By condition (2.4), the preceding expression reduces to 

Cj — 0, O'' == 2, • • • , r) , 

which shows the linear independence of V\, • • • , V r . 

III. Algebraic Derivation of the Normal Equations 

Consider a linear function 

r 

(3.1) l = piXi + n 2 r 2 + • • • + p,x r = £ P>- r > • 

1 

Let the set of observations of x % and l be 

(3.2) A t = (a tl , • • • , a ui ), L = (ft, • • • , ft,) (i = 1, • • •, r ; n > r) 
respectively, then the residual i\ is 

r 

V, - £ Vfin - h, O', = 1 

J = 1 

In vector notation, 

r- £ pa- - /- • 

The theory of least squares requires us to find the values for /n, • • • , p r so as to 
make (V, V”) a minimum, or 

(3.3°) (S/vl; — L, Xl/Vl/ — = a minimum. 

Let yli, • • • , .l r be linearly independent. By Theorem 7, the vectors A X) • • • , /l r 
can be normally orthogonalized. Lot Ci, • • • , C r be the normally orthog¬ 
onal set. Then ('very A t (t = 1, • • • , r) is expressible as a linear combination 
of Ci, • • • , C t . Let us write 

(3 3) £ P,A, = £*,C,. 

1 1 

Our problem now is equivalent to that of finding the values k t (i — 1, • • • , r) so as 
to render the inner product 

(3.4) (£ hfl, - L, £ fcA - L) 

a minimum. Expression (3.4) can be written in the form 

(/,, L) — 2 £(L, <70*, + £ (*.<7* *,<?,) 

(3.5) = (L, L) - 2 £ (L, C,)*. + £ fc’ 

= (L, L) - £ (L, CO* + £ (*, - (C 0 /,))’. 

Hence (3.4) gives a minimum if and only if the last summation vanishes, i.e., 

(3.6) *, = (C„ L) (t = 1, • • • , r) • 



62 


Y. K. WONG 


The Bessel's inequality 

E K ^ a, l) 

1 

is obtained from (3.6), (3.4), and (3.5). 

To solve for p X) we make use of (3.3) and (3.6), and secure 

E -i.P. = E (ft, L)C., 

1 1 

whence 

(ft, t A lV )j = (ft, E (ft, L) ft) . 

On the right hand side we have 

(Cl, Z(c„ L)C.) = Z(C„ L) (ft, C.) = (ft, L ), 

since (C*, C t ) = 0 when i ^ k , and (C*, C,) = 1 when i — k. On the left hand 

side, we have 

(ft, E A,p) = ± (c k , AM = E (ft, AM , 

since (C*, yl y ) = 0 when j < A;. Hence the values for p h • • • , /> r are given by 

(3.7) E (ft, AM = (ft, L) (fc = 1, • • • , r), 

where (C„ 4.) = (C\, C.) = 1. 

Equations (3.7) are called the normal equations, which are derived without 
using any notion in differential calculus. 

From (3.6) and (3.5), we secure the value for the ‘quadratic residual' (F, V): 

(3.8) ( V , V ) = ( L , L) - E (L, ft) 2 , 

t“l 

which is a positive quantity by virtue of the Bessel's inequality. 

Let B\ } •• • , B r be an orthogonalized set of A h •• • , A r . Then every vector 
Bi has a non-vanishing norm, and B x = mod ( B X )C X . Hence from (3.7) and 

(3.8) , we have 

(3.7°) E (ft, AM = (ft, L), (fc = 1, 2, , r ), 

t = A 

(3.8°) (7, V) = (L, ft) — E (^ ft)7«(ft). 

»“l 

Thus we have proved the following 

Theorem 8. Given a linear function (3.1). Let the set of observations of x % 
and l be 

A % = (a,i, • • • , a, n ), L = (Zi, • • • , l n ) (i = 1, • • • r; n ^ r) 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


63 


respectively . Let A h • • • , A r be linearly independent, B h • • • , B r be the orthog - 
onalized set, and C 1 , • • * , C r , the normally orthogonalized set of A\, • • * , A r . 
Then the set of values p h • • • , p r will minimize (3.3°) if and only if the system of 

r 

equations (3.7°) or (3.7) holds true; in other words, ^2 P%A* — L is orthogonal 

»^ i 

to Cj or to Bjfor every j. The quadratic residual ( V, V) is given by (3.8°) or (3.8). 

From (3.7), we can secure the solution for pi, • • • , p T immediately without 
further application of the Gauss method of substitution. 

The proof of the following theorem does not make use of the orthogonalization 
process. 6 

Theorem 8°. Let F = 2 p x A{, where every A t is not a zero vector . The set of 
values pi, • • • , p r will minimize (3.3°) if and only if (F — L, A t ) = 0 for every 
i, i.e., F — L is orthogonal to every A ». 

The condition is necessary. To prove this, we show that if (F — L, A x ) j* 0 
for every i, then we can find another set q u • • • , q r such that n(F — L) > 
n(G — L), where G = 2 q % A % . For if (F — L, A x ) ^ 0 for every i, then we can 
find a vector A M such that (F — L, A 8 ) ^ 0. Since A„ ^ 0, we let e = 
(F — L, A 8 )/n(A 8 ) and G = F — eA 8 = 2 qvl t . Then 

n(G - L) = n(F - eA. — L) = n(F — L) — (F - L, A 8 f/n(A 8 ) , 

which shows that n((7 — L) < n (F — L). 

To prove the sufficiency, we show that for every set q if • • • , q r different from 
pi, • • • , p r then n(G — L) > n(F — L), where (7 = 2 q x A x . Let s x = q x — p<, 
and H = 2 s t A». Then 6 = F + //. Now if (F — L, A x ) = 0 for every i, it 
follows that 


(F - L, II) = 2 (F - L, = 0. 

1 = 1 


Thus 


n(G - L) = n(F - L) + n(//) . 

Since n(II) > 0, we have n(G — L) > n(F — L). 

The preceding theorem does not require the linear independence of the 
vectors A\, • • • , A r . By Theorem 7a and 7b we see that it is necessary and 
sufficient for the set A h • • • , A, to be linearly independent in order to solve the 
equations (F — L, A x ) = 0, (i = 1,2, • • • , r), or 

(A i, Ai)pi -f- (A\, Af)p2, + • ■ • + (Ai, A r )p r = (yli, L) 

(3.9) . 

(Ar, Ai)pi + (Ar, A 2 )P2 + • * * + (4r, A r )p r = (A r , L) . 

a The proof is based on the same type of reasoning as used by Jackson. See Dunham 
Jackson’s Theory of Approximation , pp. 151-152. 




64 


Y. K. WONG 


If ill, • • • , A r are linearly independent, the conclusion in Theorem 8° can be 
deduced from Theorem 8. For by Theorem 7a) A x = s lt C lf and hence 

(F - L, A ,) = (F - L, E s.tCt'j = E *„(F - L, C t ) = 0 . 

Also, Theorem 8 can be deduced from Theorem 8°. 


IV. Matrices and Their Reciprocals 

An ordered array of numbers of the form 


(4.1) 




^22, &2m 

a = (a,,) — 


^n2, 


is a matrix. If we write «(&, j) = a„, then the array of numbers (4.1) may be 
considered as a function of two variables j on the ranges of positive integers 
(1, 2, • • • , n), (1, 2, • • • , m). 1 Thus a vector is a special instance of a matrix.. 
We shall use Greek letters to denote matrices throughout this paper unless other¬ 
wise specified. When n = m, i.e. the number of rows is the same as the* number 
of columns, wo have a square matrix. Associated with every ft-row square 
matrix, k , a determinant can be defined, and for simplicity, we shall adopt the 
following notation: 


a n cii n 

D(k) = 


dnl 


An identity matrix, denoted by <5 = (r/ u ), is a square matrix of which the 
elements in the principal diagonal are 1 and elsewhere 0, i.e. d XJ — 0 (i ^ j ), 
dtt = 1. A zero matrix, indicated by «, is one such that every one of its ele¬ 
ments is 0. The transposed matrix, a', of a is formed by interchanging the 
rows and columns. We say two matrices a = ( a XJ ) and (3 = ( b tJ ) are equal in 
case a X j = b XJ for every t, j . A matrix a is symmetric in case a ' = a. The i th 
column of a is indicated by «(., i), the i th row of by 13 (i, .) and the element in 
the i ih row and j ih column by a (i, j ). Hence a(t, j ) = a X] . 

Addition: Let a be a matrix given by (1) and 13 = ( b XJ ) a matrix of the same 
number of rows and columns as a . Then 

a + P = ( a »; + &«/) • 


7 E. H. Moore defines a matrix as a function of two variables. 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 65 

We note that a + ft = ft + «• If 7 is a matrix of the same number of rows and 
columns as a, then (a + ft) + 7 = a + (ft + 7 ). 

Multiplication: Let a = (a„) be defined by (1), and 0 = ( 6 ,*) be a matrix 
of m row and r columns, then the product 7 v — aft is defined by 



Thus 7 r is a matrix of m rows and r columns. 

The multiplication of two matrices is not necessarily commutative. 

If a is a matrix of n rows and m columns, ft of m rows and r columns, and 7 of 
r rows and s columns, then a ( fty ) = ( aft ) y . If a is a matrix of n rows and m 
columns, and ft, 7 are matrices of m rows and r columns, then a(ft + 7 ) = 
aft a 7 * 

Scalar Multiplication: Let .s be a number, and a be a matrix of n rows and 
m columns, then 

s-a — (sa t ;) = a-s . 

Let 8 a denote a square matrix of n rows in which the elements in the principal 
diagonal are s, and 0 elsewhere. Then 5* = s5, where 8 is an n row identity 
matrix. We note from the associative law of multiplication that 

sa = 8 a • a — a ■ 8 „ . 

In particular, let .s = — 1 , then we have — la . For convenience, we write 
— a = — la . From the definition of addition, we obtain a definition of sub¬ 
traction for two matrices of the same number of rows and columns. 

Reciprocals of Matrices: Let a be a matrix of n rows and m columns. 
Then a matrix a~ l of m rows and n columns is said to be a reciprocal of a in case 

a-of " 1 = b n , and a 1 - a = 8 m , 

where 8 n , 8 m are identity matrices of order n, in respectively. If a matrix a has 
a reciprocal a -1 , we can prove ar x is unique. It can be shown that when a has a 
reciprocal , it must be a square matrix. 8 

A matrix is said to be non-singular in case it has a reciprocal, otherwise it is 
said to be singular . 9 It is evident that every zero matrix is singular, and an 
identity matrix is non-singular. 

Suppose a is a square matrix of order n. Let us denote the cofactor of the 
element a X] of a by e n . Then 



is called the adjoint matrix of a. 


' For the proof of this statement, see Moore, Vector, Matrices, and Quaternions . 
1 This definition is due to E. H. Moore. 




66 


Y. K. WONG 


If a is symmetric*, then e is also symmetric. Since a, iCi, + • • • + a xn e nJ = 
/>(a) or 0 according as i = j or z 5^ j, we secure the following: 

Theorem 9. Let a be a square matrix and e its adjoint , 

ae = ea = [Z)(a)]5 . 

Theorem 10. If the determinant of a is different from zero , JAen /Acre exists a 
reciprocal or 1 , and or 1 = ad; a/D{a). 

This theorem follows from theorem 5. 

The converse of Theorem 6 is also true. 

V. Symmetric Matrices of Positive Type 10 

Let a = ( a tJ ) be a matrix of n rows and m columns; and let o = (&i, • • • , 
and p = (Ai, • • • , A m ) be integers among the sets (1, • • • , n) and (1, • • • , m) 
respectively. The subsets o and p may be equal to the whole sets (1, • • • , n) 
and (1, • • • , m) respectively. Then 


^k\h\ ' * * Qk\h„ 

(3) a(<r, p) = 


a k n f'l a k n h m 


is called a minor of a. In notation we write this minor as a(o, p ) indicating the 
ranges to be o and p. 

The minor a( — o, — p), which is obtained by striking out all the k % th (i = 1, 
• • • , m) columns and A/ h (j = 1 , • • • , m) rows from a, is emailed the com¬ 
plementary minor of a ( 0 , p). 

If a is a square matrix of order n, then a(o, 0 ) is called a principal minor of a. 

Let a and P be matrices of n rows and m columns; and let a, p have the same 
meaning as above. Then a(o, p), fi(o, p) are called corresponding minors in 
a, P respectively. 

A symmetric matrix a = (a tJ ) of order n is said to be of positive type in case 
the determinant of every principal minor of a is positive, and is said to be of properly 
positive type in case the determinant of every principal minor of a is greater than 
zero. 

Corollary VI. Every element in the principal diagonal of a positive , sym¬ 
metric matrix is positive. 

For, let <7 consist of a single integer i, then a(o, o) = a lt - ^ 0. 

Corollary V2. If a symmetric matrix is properly positive , then every element 
in the principal diagonal is greater than 0 . 

Theorem 11 . If a symmetric matrix a of order n is {properly) positive , then its 
adjoint matrix c is also symmetric and ( properly) positive. 

10 Wc follow the terminology of E. H. Moore. Moore developed this notion quite 
extensively. 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


67 


The symmetry of c is evident. Let o be a subset of (1, • • • , n) and let p be 
the number of integers in a. Consider any principal minor e(a, a) in the adjoint 
matrix e. By a theorem in the theory of determinants, we have 11 

/>[€(*, o)] = (-1 )*•/>[«(-*, -a)].[/)(«r, 

where fc is an integer depending on the set o. By hypothesis a is positive 1 (prop¬ 
erly positive); hence D[a( — o , —a)] and [D(a)]' ,_1 are positive (greater than 0), 
and it follows that D[e(o , o)] is positive (greater than 0). 

Theorem 12. If a symmetric matrix is properly positive , then D(a) is different 
from zero , and a has a reciprocal a~\ which is also symmetric and properly positive. 

For take o to be the whole set (1, • • • , n) in the definition of proper positive¬ 
ness, and we see that D(a) 5 * 0. The theorem now follows from Theorems 10 
and 11. 

VI. Gramian Matrices 

In this section, we shall study the matrices of the normal equations (3.9). 
The main result is that if the set of observations A iy • • • , /l r is linearly inde¬ 
pendent, then the matrix (called Gramian matrix) is properly positive and has*a 
reciprocal which is also properly positive. t 

Theorem 13. Let A\, • • • , A, he a set of vectors, and let B\, • • • , B r be the 
orthogonalized set of vectors. Then the matrix 

/(A u A,) ... Ui,A r ) 

(6.1) ru,, ••• ,-u = . 

\(A r ,A0 ... {A n A r ) 

has the following properties: 

13.1) symmetry 

13.2) D[f(A u • • - , A r )] = n{B0n(B 2 ) . • • n(B r ), 

13.3) positiveness. 

A matrix of the form (6.1) is called a Gramian matrix. 

In fact, the symmetric property follows from the fact that (A t , A } ) = (A Jf A t ) 
for every i, j. 

We shall prove 13.2) by induction. For r = 1, we have by Theorem 5 

Ui,A,) = («„ B\) = »(/*,). 

Assume the equality is true for r = t, we shall show it is true for r = t + 1. 
The (t + l)-row determinant is as follows: 

{A\y /ll) ••• {A\y A t ) {A\y A t + j) 

(6.2) ■■■ , ii,)] = ( Alt At ) ... (A„ A t ) (A„ A l+1 ) 

{A\yA t + 0 * * * ( A t y At+\){At+\y A tA 1 ) 

11 In case a = (1, • • • , n), —a is a null class A (a class which contains no element); then 
we define D[a(—<r, —a)] = 1. For the proof of this theorem, see Bocher, p. 31. 





68 


Y. K. WONG 


By Theorem 5, there exist constants s t (i = 1, • • • , t) such that 

t 

At+1 =■ Bt ji + ^ A t • 

i=i 

Substituting this value into the last row, we find the element in the z th column is 
( A tf At+i) = (a %j Bt+1 + = (A %J B t ^i) -f- ^ Sj(A t , A^ 

\ j*i / j i 

= l, • • • , ty t -f-1). 

The second term on the right is a linear combination of the first t elements in the 
i th column of the determinant (6.2) and hence by the theory of determinants, 12 
we secure 

(Ai, A\) A lf A t ) (i4i,A f+1 ) 

DltUu ■■■ ,A H1 )] = {Ai>Ai) ...{ At>Al) ( At>Al]l ) 

(A h B l+l ) B lt i) (Am, B t+l ) 

By Theorem 5, we find that ( A ,, B l+ j) = 0 for i = 1, • • • , t, and (A t + h B th i) 
= (B t+ 1 , and hence the preceding determinant reduces to a form in which 
the first t elements in the (t + l) th row are zero. Thus 

(Aiy Ai) • • • (Ai, A t ) 

D[S{A U •••,A » X )]= -n(» l+ i) 

I (A,, A,) ... (A,, Ai) I 

= n(Bi)n(Bi) ■ ■ • n(B t )n(B H i) 

which proves 13.2). 

Consider any subset a = (k\, • • • , k rn ) of the set (1, • • • , r). By the same 
argument as above, we find that the determinant of any principal minor 

(Ak j, A k f) * * * 0^*1* A k m ) 

(6.3) . = n(B kl ) • • • n(B km ). 

( A kmf d Al ) • • • 04 a- wi , A km ) 

By Theorem 1, the number on the right is positive. Thus the matrix f is 
positive. 

Theorem 14. The following three assertions are equivalent : 

14.1) the set Ai, • • • , A r is linearly independent; 

14.2) the Gramian matrix (6.1) is properly positive ; 

14.3) The determinant of the Gramian matrix (6.1) is different from zero. 

We shall prove that 14.1) implies 14.2); 14.2) implies 14.3); and 14.3) implies 
14.1). We thus prove the three statements are equivalent. 


12 Dickson, First Course in the Theory of Equations (1922), p, 113. 





ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 69 

Let #!,•••, B r be the orthogonalized set of the set A h • • • , A r . Since the 
set A i, • • • , A r is linearly independent, then every subset 

A kv ■ • • , A km (l ^ fa ^ ... ^ fc m ^ r) 

is also linearly independent, and hence n(Bk t ) > 0 for i = 1, 2, • • • , m. By the 
same argument as given in the demonstration of Theorem 11, we find that the 
determinant of any principal minor (6.3) is greater than zero. This proves the 
matrix (6.1) is properly positive. 

If the matrix (6.1) is propelly positive, then by Theorem 10 the determinant 
of (6.1) is different from zero. 

To prove 14.3) implies 14.1), suppose k x (i — 1, • • • , r) are such that 
k\A\ -j- ..* + k r A r = 0 . 

Then 

(k\A\ -f- • • * k r A r , A x ) = k\{A], A t ) -f* * * * k r (A r , i4 t ) = 0 

for i = 1, • •. , r. Since (.4,, Aj) = (A Jf A x ), and D(f) ^ 0, the set of con¬ 
stants k t must be all equal to 0. 13 

From Theorem 14, and Theorem 10, wo may state the following 
Corollary: If the set of observations A\, • • • , A r is linearly independent, then 
the Grarnian matrix f has a reciprocal which is properly positive. 

VII, Gauss Method of Substitution 

Lemma 7.1) Let <p = (s tJ ) be an r-row symmetric matrix such that s n ^ 0. 
Then there exists an r-row square matrix t whose determinant is unity such that 
4 / = ( r tJ ) = np has the following properties : 

a) r t i = 0/or i > 1, and r u = s Xx for every i; 

b) the first minor of rn is symmetric ; 

c) the determinant of every principal minor in 4 1 of the form 

Sn #U2 • • i‘u m 

(7.1) 0 ■ • r Wm (2 ^ ki g • • • £k m ^r) 


^ ^ k%km * ^ kmkn 

is equal to the determinant of the corresponding principal minor in <p. 

To prove this lemma, let us define 

(7.2) r = 5 + Fx-Dx, 

where Z>i is the first row of an r-row identity matrix 5, and Fi(l) = 0, 

F x (n) = -sin/sn (n > 1). 

(Thus F x Di is an r-row square matrix in which the first column is F x and every¬ 
where else 0.) It is clear that r thus defined is a square matrix of order r, and 


18 See footnote 5. 



70 


Y. K. WONG 


I)(t) = D(8 + F\Di) = 1. By multiplication of these two matrices, r <*, we 
obtain a new matrix such that r n = s n , r tl = 0 for i > 1, and r lt = s lt for 
every i, and further 

(7.3) r 1} = # t| - sirSij/su for i > 1, j > 1. 

To prove property (b), wo note that a\ rj = s jl} since is symmetric. Thus for 
i > 1, j > 1, we note from 7.3) that 

= s tJ - ShiSij/su = 8 Jt ~ XijXu/sn = r Jt . 

For the proof of the last property, we note that the corresponding minor of 
(7.1) in v? is of the form 

#n ,s > i /, 2 • *n m 

(7.4) ,S U 2 S A 2 Ar2 * Sa 2 A w 

L^lAm '^A 2 Am * 'V.wj/.wjJ 

Since is symmetric, we have by (7.3), 

ri t hj = «VA t A ; — su^ujsn (i > 1, j > 1), 

0 = «A t l ~ O' > 1). 

Thus by a theorem in the theory of determinants, the determinants of (7.1) and 
(7.4) arc equal. 

Lemma 7.2) Let <p = (.s* i; ) (/, j = 1, • • • , r) be a symmetric matrix of positive 
type y and Sn j* 0. Then there exists an r-row square matrix r whose determinant 
is unity such that \p = (r I7 ) = r<p has the properties stated in Lemma 7.1) ami 
furthermore the minor of r u in 7.1) is of positive type. 

To prove the positiveness of the minor of rn, lot the determinant of any one 
of its principal minors be 

>*A 2 A 2 ' • * >*A 2 A* m 

Mi = (2 £ ^ £ k m g r), 

> A2 Am ’ ' * >A»nA*m 

where r^A-, = r* ; * t (?, j = 2, • • • , m) due to the symmetry. Now consider the 
bordered determinant 

>*11 >*1*2 >’ia- w 

7l/ 2 = ^ >*A* 2 A* 2 • • * >*A, 2 A: OT 


> A 2 A*„, > Am At/, 

which by property (a) in Lemma 7.1) gives M 2 = ruMi = suilfi. By property 
(c) in Lemma 7.1), Af 2 is equal to the determinant of the form (7.4), which by 
hypothesis is positive. Thus s n Mi ^ 0. Since s n > 0, we conclude that 
M l = M 2 /8n ^ 0. 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 71 

Lemma 7.3). Let $ = (s u ) (/, j = 1,2, • • • , r) be a symmetric matrix of 
properly positive type. Then there exists an r-row square matrix r whose deter¬ 
minant is unity such that \p = (r i; ) = r<p has the properties stated in Lemma 7.1) 
and furthermore the minor of rn in \p is properly positive. 

Since <p is properly positive, we find that .s“n > 0. The proof of this lemma is 
similar to that of Lemma 7.2). 

Suppose that the set of observations A h • • • , A r is linearly independent. 
Then by Theorem 14, the Gramian matrix (6.1) is symmetric and properly 
positive, and hence (A i, A\) >0. By Lemma 7.3), the matrix (6.1) can be 
reduced to the form 


[A\A i*0] [AiA 2*0]. [A i.4 r *0] 


(7.5) 

0 

[-4.2 Ao’ 1] 

[ ^ 1 2 yl 3 - 1] ••• [id 2 A r ’ 1J 

where 

0 

lUAr-l] 




= (A u Ad = 1 

:.4«.4,-0] 


[At A,. 1] 

_ [4,-4,-0] [A, A,. 0] - [4,4,.0][4,4.-0] 
[A\ A\ •()] 


It is evident that = (A h A i) > 0, since the matrix (6.1) is properly 

positive. By Lemma 7.3) the value of I) ($*) and the determinant of (7.5) are 
equal, and furthermore the minor of the element [^4 1 - 0] is a symmetric matrix 
of properly positive type. Thus [A 2 A 2 A] > 0, and [A t A 8 A] = [A 8 A r \\. 

The minor of [djdi-O] surely satisfies all the conditions in Lemma 7.3). We 
may, therefore, apply a transformation of the form (7.2) to the minor of [AiAi-O), 
and secure another matrix of the same character as (7.5). In other words, we 
may multiply on the left of the matrix (7.5) by 

(7.6) r 2 = 5 + F 2 D 2 

where D 2 is the second row of the r row identity matrix 5, and 

b\{n) _ 0 (» g 2); F t (n) = I'M"', 1 . 1 (» > 2). 

[A 2 A 2 • 1J 

In general, let 

(7.7) r. = 5 + F t D x (i =1, ■ • , r — 1;, 

where D t is the i th row of the r row identity matrix 5, and 

F.(n) = 0 F,(n) = - \ A .' A . n [ l ~ J] 


(7.8) 


(w > i). 




72 


Y. K. WONG 


Continuous application of this type of transformation ultimately reduces the 
matrix (6.1) to the form 


A l A 1 -0] [AxAt- 0] [A x At 0] . •. [A x A r - 0] 

0 [AtA t . 1] lAtAt-1] ••• [i4.Ar.ll 
V= 0 0 [A 3 A 3 -2] ... [A 3 A r -2] 

0 0 0 [A r A r -r-l]J 

. . [AaA*-/« - 1] [A,A,/i-l] - [ ^4 , - A. — 1] [A*A,-A 1] 

A t A a • h\ = 

[A*A a -/i - 1] 

(f, s = 1, • • • , r; 0 ^ H $m(£, s)). 14 

In the matrix (7.9), we see by virtue of Lemma 7.3) that [A*A»-i — 1] > 0 for 
every i, and [A<A*-/i] = [A s A r h\ for every s, Y and 0 ^ h ^ sra(£, s). If 
/i = sm(t , «), then [A,A s -/i] = 0. 

Let r = r r _i • r f _o ri. Then by the associative law of multiplication of 
matrices, we sec that 

(7.10) tj = (r r _i • • • n)f = rf. 


(7.9) 

where 

(7.90 


Thus we prove 

Thkokem 15. If the set of vectors A x , • • • , A r is linearly independent, then there 
exists a square matrix r of order such that tJ* is of the form (7.9) where all ele¬ 
ments below the principal diagonal are 0; every element in the principal diagonal 
[A X A x ■ i — 1] (i = 1, • • • , r), is greater than zero; and [A t A 8 -h] = [A a A r h] for 
s, t = 1, • • • , r, and h < sm(t, s). Furthermore the determinants of the matrices 
(6.1) and (7.9) are equal. 

We now prove the following lemma which will be useful in the later section. 
Lemma 7.4). If [A t A t • i — 1] is different from zero for every i ^ 0, then for 
every pair of integers (s, t), v)here s, t = 1, • • • , r, and n ^ s?n(t, s), we have 

n — 1 

a) [AtA.-n] = (A t , A.) - g [[ j [A.A.-i - H. 

b) [A,(A, + A„) -n] = [A(A„-n] + [A,A„-n], (u = 1, • • • , r). 

c) [{cA t )A,-n] = c[A t A a -n\, (c = a constant). 


To prove a), take every pair (s, t). We find the lemma is true for n = 0. 
Assuming it is true for every (s, t ) and for n = h < sm(s, t), we find that h + 1 ^ 
sm(s , t), and 


(A„ A.) 


h± 1 

2 


» - 1 


[A,A ( -i — 1] 
[A.A.-j — 1] 


[A,A, • i 


1 ] 


14 sm ( 5 , 0 read “the smaller one of (/, $).” 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


73 



\A t Afi — 
[A t A x -i — 


1] 

1] 


[A t A a -i — 1 ] 


[A h ii At • h] 

[JTA^h] 


[Ah+\A a 'h] 


= [AtA.-h] 


[A h +iArh] 

[A h A h -h] 


[A^-fl/1 a * h"\ 


[A t A a-li + 11 1 


for every s, t. 

Parts b) and c) are true for n = 0. Now make use of the equality in a) and 
prove by induction. 


VIII. Gauss’s Method of Substitution and its Relation to Gramian Schmidt’s 

Orthogonalization Process 

Let us write the set of observations in the form: 


((in (i 12 din's 


\fl r l d r2 * * * d rn / 

Let the orthogonalized set also be written in the form 

/6n • • • h\ n \ 

J*r i b rn/ 

From Theorems 5 and 6, we find that there exists a transformation k given by an 
r-row square matrix such that 0 = kc *. Thus by the associative law of multi¬ 
plication of matrices, we have 

Pol' = ( Ka)a ' = K(aa') . 

Now the matrix aa is the Gramian matrix (6.1). Thus 

(8.1) 0a' = 

The composite matrix pa f is of the form 

’(£„ i4i)(J?i, At) ••• (B h A r ) 

, (B 2) A l )(B 2f A 2 ) ... (B 2 , A r ) 

( 8 . 2 ) 


l(B„ Ad(B r , At) (Jir, Ar)] 

By Theorems 5 and 6, we note that ( B a , yt,) = 0 for s > t, and ( B ay A a ) = 
( B b , B a ) for every s. Thus the preceding matrix can be written in the form 


(8.3) 


\B h B 1 )(B lj A 2 )(B h A 3 ) ... (B h A r ) 
0 ( B 2f B2 )(B 2j A 3 ) .*• (B 2y A A 


0 


0 


0 ... (Br, Br) J 




74 


Y. K. WONG 


Wc have proved the following theorem: 

Theorem 16. Let Ai, • • • , A r be a set of vectors , arid ft, • • • , B r be the 

orthogonalized set; and let a = (a tl ), — (b t] ). Then there exists a square r-row 

matrix k such that /3 = *a, and icaa' is a matrix of the form (8.3) where all the 
elements below the 'principal diagonal are zeros and every element in the principal 
diagonal is positive. If the set • • • , A r is linearly independent, then every 
element in the principal diagonal is greater than zero. 

Theorem 17. Let A\, • • • , A r be a set of vectors and ft, • • • , B, be the 

orthogonalized set; and let a = ( a tJ ), fi — ( b tJ ). Then = I)(aa'). 

For by equations (2.1), we note that I)(k) = 1. Thus 

Dtfa') = /)(«*«') = D( K )I)(aa') = D(aa') . 

Theorem 18. If the set of vectors , A i, • • • , A 7 is linearly independent , the 
matrix k arising from Gram-SchmuWs orthogonalization process is identical with 
the matrix r defined by (7.10). 

To prove this theorem, we first establish the following 

Lemma 8.5): If the set A\, • • • , A r be linearly independent , and ft, • • • , B r be 
the orthogonalized set , then for every t, h y we have 

(ft, At) = [A h Arh - 1]. 


By Theorem 10, the set B % is linearly independent, and hence n(ft) > 0 for 
every i. The lemma is evidently true for every t and h = 1. Assuming it is true 
for every t and h = s, we shall prove it is also true for every t and h = s — 1. 
Now 


ft+i 


A a-fi 


( 4 », ft) 

" (ft, ft) 


By = Ag+ l 


S [A t A.-i - 1] R 

, =1 U^-i - 1] 1 


Thus by the linear property of ( , ) we secure, for every t 


(ft+i, A t ) 



[A x A t -i - 1] 



= (A , 4 i, A t ) — 

t=i 


[A t A s -i - 1] 
[4,4*-i - 1] 


(ft, A t ) 


= (d« + i, Ai) — 

i=i 


[4,4,-t - 1 ] 

[4,4,-i - 1] 


• i 


1] 


= [A 8 +iArs] 


by virtue of lemma 4.4). 

From this lemma, we conclude at once that the matrices (7.9) and (8.3) are 
equal. Thus by (8.1), we have 


= /3a' = rf, or (k — r)f = w . 



ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


75 


Since f is non-singular (by Theorem 12), we have 

a) = (k — r)ft* _l = (k — t)S = k — r, 

which proves the theorem. 

From Lemma 8.5), we have 

Lemma 8.6). Let L — (Zi, ••• , l n ). Suppose the set A h ••• , A r to be 
linearly independent, and B h •• • , B r to be the orthogonal i zed set. Then for 
every h , 

[A k L-h - 1] = (B„,L). 

Theorems 16, 17, and 18 furnish us a new method for finding the most prob¬ 
able values of the unknowns in the theory of least squares. The formulation of 
the system of normal equations may be omitted in this new procedure, which 
may be described briefly as follows: After we obtain a set of observations 
Aij ••• , A n we orthogonalize this set by means of Gram-Schmidt’s process. 
Let L be a non-zero vector. The product 

/bu ■ • * bi n \ /(111 • • ■ O'ri) — h\ 

\j)jl • • • brnj \Rl n * ' ’ ^rny In) 

will give us the result as desired by Gauss’s method of substitution. 

Academia Sinica, 

Peipino, China. 



A NOTE ON THE ANALYSIS OF VARIANCE 1 


B\ Solomon Kullback 

By considering a set of independent items classified in some relevant manner 
into N sets of s items each, and by the use of a dispersion theorem of Prof. J. L. 
Coolidge, 2 3 Prof. II. L. Rietz 4 arrives at estimates of variance, used by Dr. B. A. 
Fisher, without making use of arguments involving the number of degrees of 
freedom of the items concerned. 

By proceeding along the lines followed by Coolidge and Rietz but considering 
a set of independent items classified into N sets of ,s,(z = 1,2, • • • , A r ) items 
each, we shall arrive at certain other important results of It. A. Fisher 4 in his 
analysis of variance. 

The theorem referred to above is as follows: If n independent quantities 
7/1, 7 / 2 , • • • , yn be given, their expected values being aj, </ 2 , ■ • • , a n , while the 
expected values of their s(piares are Au d 2 , ••• , A n , respectively, and if wc 

n 7i 

agree to set y = (l/n) //«> a = OAO a u then the (‘xpected value of the 

t --- i t -1 

n 

varianco, (1 /m) X (.'/• — !/)' 

—y 1 2 / ~ a ^ + 2 / ^ — ■ 

tl 1—1 1 = 1 

Suppose a set of independent items has been classified in some relevant man¬ 
ner into N sets of s t (i = 1,2, • • • , N) items each as follows: 

7*12, ' * * , •Tlsj, X\ 

- 7*21 , $ 22 , • • j 7*2 ».>> ^2 

( 2 ) . 


a) 


•Cvi, * * * > y, Xx 

X 

where .f t (* = 1,2, • • • , AT) is the arithmetic mean of the P h set and x the mean 
of the pooled sample of s — si + s 2 + • • • + Sw items. 

We shall assume that the set (2) is statistically homogeneous in the sense that, 

1 Presented to the American Mathematical Society, February 23, 1935. 

2 Bulletin Am. Math. Soc., Vol. 27 (1921) p. 439. 

3 Bulletin Am. Math. Soc., Vol. 38 (1932) pp. 731-735. 

4 Proceedings of the International Math. Congress, Toronto, 1924, Vol. 2, p. 802 ff. 

70 




NOTE ON ANALYSIS OF VARIANCE 


77 


using E ( ) for the expected value of the expression in the parenthesis, we may 

let E(x tj ) = a, E(x\,) = A, (i = 1 , 2, • - • , N, j = 1, 2 , • • • , s,). 

Then, using (1) 


(3) 


2 ) = («.— !) (.4 - a 2 ). 


\-1 


Summing (3) from i = 1 to N, we have 


(4) E 


( 2 (-fu - j.)*) = (.4 - a 2 ) 2 («. - 1) = (* - jvju - «*: 

j = l / 1=1 


Similarly, by iisin^; ( 1 ) 

(5) 

But 5 

( 6 ) 

(7) 



N - 1 
" N 


Yj «, [£(.??) - a 2 ]. 


( 8 ) 


E 


- .?) 2 = (iV - 1)U - u 2 ) . 


Similarly by using (1) 
(9) E 


S ~ - !) M - « 2 ) • 

Thus, in a statistically homogeneous set of items, classified as in ( 2 ), the fol¬ 
lowing estimates of Variance have the same expected value: 



v — 8 

v 8 - 1’ 

where 

1 

II 

(10) 

y _ 

' s - N’ 

where 

V, a t 

*S, = ^ (x„ - .F,) 2 

I = 1 , j = 1 


v S * 

Vx N - f’ 

where 

,V 

s *^* ~ • g ) 2 • 


1 = 1 


These estimates are used in applying the analysis of variance to the study of 
the correlation ratio, 77 , for uncorrelated material, where rf = Sx/S. 


Office of the Chief Signal Officer, 
Washington, D. C. 


6 Rietz, H. L., loc. cit. p. 733. 



A PROBLEM INVOLVING THE LEXIS THEORY OF DISPERSION 

By Walter A. Hendricks 

The attention of the author was recently directed to a study of the hatch- 
ability of chicken eggs at the U. S. Animal Husbandry Experiment Station, 
Beltsville, Maryland. It was necessary to find the average hatchability of the 
fertile eggs incubated for each of a number of lots of birds and the corresponding 
standard errors of those averages. 

It was very apparent that some methods for computing such values, in com¬ 
mon use at the present time, do not give satisfactory results. This is due to the 
fact that the fertile eggs produced by different birds vary considerably with 
respect to hatchability as well as with respect to number of eggs available 1 for 
incubation. It seems reasonable to suppose 1 that the variability in hatch- 
ability of a number of fertile eggs, produced by a given number of birds, should 
obey the Lexis law of dispersion. This supposition is based on two hypotheses: 

(a) The probability that a fertile 1 egg will hatch is constant for all fertile eggs 
produced by the same 1 bird during the time interval under consideration. 

(b) The probability that a fertile egg will hatch varies from bird to bird. 

The reader familiar with the principle's of genetics may question the validity 

of the first of these 1 hypotheses. The probability that a fertile egg will hatch is 
largely governed by the 1 genes earrie'd by the 1 chromosomes of the emim of the 
hen and the sperm of the male birel which fertilized that ovum. The kinds e)f 
genes carried by various ova and spermatozoa are 1 not necessarily the 1 same, even 
when those e>va anet spermatozoa are produced by the 1 same female and male 
birds, respectively. However, if we have a sample of a number of fertile eggs 
proeiuced by the same hem, we are justified in assuming that the proportion of 
those eggs which will hatch is constant, except for sampling fluctuations, when 
successive samples of fertile eggs produced by the given hen are incubated, pro¬ 
vided, of course, that the eggs in the successive samples were fertilized by the 
same male bird or birds. The limit approached by the proportion of fertile eggs 
which hatch as the number of fertile eggs produced by the given hen becomes 
infinitely large may be defined as the probability that a fertile egg produced by 
that hen will hatch. It will be recognized that this definition is based on purely 
academic considerations, since there are physical limitations to the number of 
fertile eggs which a hen can produce in a given period of time. Hypotheses (a) 
and (b) are to be interpreted in the light of this definition of the probability that 
a fertile egg produced by a given bird will hatch. 

Let si, s *> • • * s " represent the numbers of fertile eggs produced by n birds 
during a period of time and let /i,/ 2 , • • • respectively, represent the numbers 

78 



PROBLEM INVOLVING LEXIS THEORY OF DISPERSION 


79 


of chicks obtained from those eggs when the eggs are incubated. Let pk = — 

represent the hatchability of the fertile eggs produced by the k th bird. 

The squared standard error of pi is given by the Lexis formula: 1 


a 


2 

Pk 


PQ at - 1 

.% nsk 


2 - p Y 

t » 1 


( 1 ) 


in which the P t represent the respective probabilities that the fertile eggs pro¬ 
duced by the n birds will hatch, P is the arithmetic mean of the P t , and Q is 
equal to 1 — P . 

The values of the probabilities, P t , are not known. However, as a first 
approximation to equation (1) we may write: 


<7 


2 _ 
Pk ~~ 


m _j_ «*_-1 
Sk ns k 


s 

t — I 


(:Pt - pY 


( 2 ) 


in which p is the arithmetic mean of the p t and q is equal to 1 — p. 

The product, pq, can be accepted as a reasonably close approximation to the 

n 

product, PQ, but the expression, (p, — p) 2 , will, in general, be greater than 

n 

the expression, ^ ( P t — P) 2 . The reason for this is apparent when we con- 

/-=i 

sider that if each of these two expressions is divided by n, the former yields an 
estimate of the squared standard deviation of the p t while the latter yields an 
estimate of the squared standard deviation of the P t . The standard deviation of 
the p t will, in general, be greater than that of the P t because the p t are more or 
less imperfect estimate's of the P t and are, therefore, subject to sampling errors 
from which the P t are free. 

We may write: 

- 2 a*, - py = - 2 (p, - py - o) 

i ^ i t = i 

in which <j\ is an appropriate correction as yet undefined. 

Since the p t would approach the P t as statistical limits if each of the s t were 
made extremely large, it follows that <j\ must approach zero as each of the s t ' 
approaches infinity. Furthermore, if Pi = P 2 = • • ■ P n = P, we must have: 


\ 2 - o- 2 = 0 or 

<T-„ = l - 2 - pY ■ 

t »1 


(4) 


1 The formula as given in this paper is a modification of that given by Rietz, ILL. (1927) 
in his book, Mathematical Statistics, Open Court Publishing Co., Chicago, which was 
necessary in order to make it applicable to relative frequencies. 



80 


WALTER A. HENDRICKS 


These conditions suggest that <r 2 c be defined by the equation: 


<7 


2 


wyi 

n £/ * 


( 5 ) 


If (7c is so defined, it will obviously approach zero as each of the s t approaches 
infinity. Furthermore, it has been shown by Yule 2 that if we have a series of n 
relative frequencies, such as the p t under discussion, based on n samples of 
unequal size, and the probabilities of the occurrence and non-occurrence, 
respectively, of the particular event under consideration are constant from 
sample to sample, the squared standard deviation of those relative frequencies 
is given by a relation such as that used to define <7^ in equation (5). There¬ 
fore, the second condition is also satisfied. may be interpreted as repre¬ 
senting that part of the squared standard deviation of the 1 p t which is due to 
the unreliability of the p t as estimates of the P t . 

Therefore, it seems reasonable to write: 


- 2 (Pt - py =- 2 fo* - p) 2 - v<i 2 - • 

n n n ^ s t 


( 6 ) 


Combining equations (1) and (6), we obtain the following formula for calcu¬ 
lating the squared standard error of p k : 

'5* " ” + fS (p ' - rt ' “ m 2 l. 


ns k 


<- 


Since the weight of a measurement is inversely proportional to the square of 
the standard error of the measurement, we are now in a position to calculate a 
weighted mean, p, af the p t . 


p « -‘-f 


23 w tpt 


in which: 


w t = 


23 w t 

(-I 

1 


( 8 ) 


(9) 


The squared standard error of p is given by the familiar formula: 


2 __ t = 1 

(7- = - 


23 w t(pt - py 


do) 


(n — 1) 23 w ‘ 


2 Yule, G. Udny, 1927. Introduction to the Theory of Statistics, Charles Griffin and 
Co., London. 



PROBLEM INVOLVING LEXIS THEORY OF DISPERSION 


81 


It would seem that p may be accepted as a good estimate of the average 
hatchability of the fertile eggs produced by the given lot of birds, and that 
equation (10) may be used to obtain a valid estimate of the reliability of p. 

However, the problem is not quite so simple. In the first place, there is 
usually a small amount of positive correlation between the number of fertile 
eggs produced by a bird and the hatchability of those eggs. Secondly, as 
pointed out earlier in this paper, the hatchability of fertile eggs is influenced to 
some extent by the male birds used to fertilize the eggs. The error involved in 
neglecting the correlation between hatchability and number of fertile eggs incu¬ 
bated does not seem to be of much importance in those practical problems which 
have come to the author's attention. The effects of differences among the male 
birds may be largely obviated in experimental work by frequently transferring 
male birds from lot to lot during the experimental period. 

The best test of the suitability of a particular formula for calculating the 
standard error of an average is to compare the value of the standard error 
calculated by means of the formula with the corresponding value obtained by 
direct calculation from the distribution of a number of such averages obtained 
under essentially the same conditions. The accompanying table gives the 
standard error of the weighted average hatchability of fertile eggs calculated 
for each of four lots of birds by means of equation (10), together with the corre¬ 
sponding values obtained from the distribution of averages. The former are 
designated as the “predicted” values and the latter are designated as the 
“observed” values. In the calculation of the “observed” values, the various 
averages were assigned the same weights which were used in the calculation 
of the “predicted” values. 


Comparison of “ predicted ” and “ observed” standard errors of the weighted average 
hatchability of fertile eggs, calculated for each of four lots of birds 


Lot 

V 

Standard error of p 

“Predicted 77 

“Observed 77 

i 

0.7684 

0 0287 

0 0327 

2 

0.7115 

0 0533 

0.0561 

3 

0.6834 

0.0355 

0 0379 

4 

0.7260 

0.0615 

0 0674 


The data used in these calculations involved a total of 74 birds, approxi¬ 
mately equally divided among the four lots, and a total of 2,901 fertile eggs 
which were produced and incubated during an experimental period of 48 weeks. 
The agreement between the “predicted” and “observed” standard errors of the 
weighted average hatchability for each lot of birds is excellent. However, the 
author's experience with biological data tends to make him doubt that such 




82 


WALTER A. HENDRICKS 


dose agreement will always be found when such data are subjected to the 
above treatment. The agreement in the present illustration could be less 
close without indicating that the method of calculating the “predicted” stand¬ 
ard errors is unsound. 

Bureau of Animal Industry, 

U. S. Department of Agriculture, 

Washington, D. C. 



A METHOD FOR DETERMINING THE COEFFICIENTS OF A 
CHARACTERISTIC EQUATION 

Paul IIorst 


For the characteristic equation 

«u — •**••• d\n 

= (— l)"(.r" — CiJr n ~ l + c 2 .r n 2 - • • • + e n ) 
a„i 

(j* — on) Or — of 2 ) • • • Or — a n ) 

it is well known that 

c t = A t 

when' A t is the sum of all i <h onler eo-axial minors of the determinant 


flu • • • (i\ n 

( 2 ) 

Un\ * ' * (Inn 

If n exceeds 3 or 4, the process of calculating all possible principal minors is 
very cumbersome. 

But another more systematic method of calculating the c\s may be adopted. 
Suppose we define 


and 




■ a 


in) 


t 


J p) 


= s 


p • 


It may be proved 1 that 


S„ = £ aV? • 


(3) 

(4) 

(5) 


But from Newton's identities 2 we have 

S p -f- CiSp-i c 2 2 4~ * ’ * O i$i -f- pfp — 0 . (6) 

1 Muir, L. & Metzler, W. II., “A Treatise on the Theory of Determinants,” p. 606, 
If 650 and 651. 

2 Dickson, L. E., “First Course in the Theory of Equations,” p. 134, If 106. 

83 



84 


PAUL HORST 


Newton’s identities are ordinarily employed for calculating the sums of the 
powers of the roots of a polynomial when the coefficients are known. They may 
be employed equally well, however, for calculating the coefficients when the 
sums of the powers are given. Thus by means of equations (5) and (6) the 
coefficients of (1) may be readily calculated. 

If in (2) a tJ = a Jlt the calculation of the successive A p values is straight¬ 
forward. The determinant A is used as a constant multiplier so that 

A-A = A 2 , AA 2 =A 3 ,- AA n - 1 = A n 

and the multiplication is column by column. That is, 


a 


(i \ p ) 

»i 


n 


= Z «A. a 

k = l 


(v) 
k ; • 



THE GENERALIZED PROBLEM OF CORRECT MATCHINGS 

By Dwight \V. Chapman 

A method common to many experimental and testing procedures in psychology 
and education is to require an individual to match, as best he can, members of 
one scries of items with members of a second series of quite different items certain 
of which are in some sense true apposites of items in the first series. Thus the 
experimental psychology of personality has often investigated the ability of 
graphologists or laymen to pair samples of handwriting produced by a group of 
persons with, say, character-sketches of these same persons; and the excess of 
correct matchings thus produced over the number to be expected by chance has 
been used as evidence that the expressive movement of handwriting affords 
characteristics diagnostic of personal traits. Fortunately, the excesses experi¬ 
mentally obtained have often been so large as obviously to exclude the operation 
of chance* alone. But many empirical results show small excesses only; and the 
interpretation of such findings has not hitherto been subjected to rigid statistical 
analysis. 

The particular statistical problem resident in this experimental procedure is 
twofold, involving the estimation of the significance of (a) a given number of cor¬ 
rect matchings produced by one individual, and (b) a given mean number of cor¬ 
rect matchings produced by a group of individuals working with the same mate¬ 
rial independently. 

Furthermore, two cases arise in practice: (1) the two series of items arc of 
equal length, and each item in either series has a true apposite in the other series; 
or (2) the two series may be of unequal length, in which case the longer series 
contains not only a true apposite for each item of the shorter series, but, in 
addition,fa certain number of extra, irrelevant items which cannot be correctly 
matched with any items in tin* shorter series. I have already given the solution 
to problems (a) and (b) for case (l). 1 But case (1) forms only a corollary of the 
more general cast* (2), to the solution of which this present paper is devoted. 

(a) The Significance of a Given Number of Correct Matchings Resulting 

from a Single Trial 

Let there be given a series of u x-i terns, 

Xi, x 2 , • • • x t , • • • x u 

and a series of t //-items, 

_ 2/i, 1/2, •• • Vt . 

1 The Statistics of the Method of Correct Matchings, Amcr. Jour. Psychol 46, 1934, 
287-298. 



86 


DWIGHT W. CHAPMAN 


Lot t g u, and let the first t .r-items be in sonic sense true* apposites of the corre¬ 
spondingly numbered //-items, so that if //, be paired with .r y (j = 1, 2, • • • /), 
this pairing will constitute a correct matching. 

The first problem which arises is that of determining the probability that a 
single random arrangement of the / //-items against t of the r-items will result in 
exactly s ( = 0, 1, 2, • • • t) correct matchings. 

We begin by putting the first s //-items in correspondence with their apposite 
j-items. Then the number of arrangements of the t //-items in which only these 
.vare correctly matched is the number of arrangements of the remaining t — a //- 
items against the remaining u — s i-items such that no correct matchings occur. 
With respect to these items, let 

n = tlie number of all possible arrangements, 

n{Yj) = the number of arrangements such that at least the j ih item is cor¬ 
rectly matched with its apposite, 

n(YjY k ) = the number of arrangements such that at least both the j tb and k th 
items are matched with th(»ir appositos, etc.; 

and let 

?i(Yj) = the number of arrangements such that at least the j th item is not 
matched with its apposite, 

n(YjY k ) = the number of arrangements such that at least the/ 1 * and k {h items 
are not matched with their apposites, etc. 

We have then to evaluate the expression iF s » 2 • • • Y t ), the number of ar¬ 
rangements of the items remaining, after setting a of them correctly matched, 
such that no further correct matchings occur. 

Now it can be shown that 2 

ft(Yg- f-il a +2 • • • 1 1) — n 

~ W^st-i) + <r„>) + •** + 

+ [^(Ta+lFa-i 2 ) + 7l(} aflF^s) -f- * • * + tl(} \} *)] 

- ln(Y.+ l Y,»Y 9 + i 0 + • • • + n{Yt-*Y t -\Y t )\ 

+ ••• 

+ (-l)‘ n(r. + iVa +2 ... Y t ). 

The value of the expressions on the right side of this equation can be deter¬ 
mined as follows: 

2 II. Whitney, A Logical Expansion in Mathematics, Bull. Amer . Math. Soc ., 1932, 572- 
579. 



GENERALIZED PROBLEM OF CORRECT MATCHINGS 


S7 


The value of n is the number of ways in which t — a items can be arranged 
against 


u — a items, which is ---—— * -—- 

[( u _ s) - (t - a)]! 


in - a)! 
(u - t )! * 


The value of the first bracket—the number of arrangements of these; items 
such that some one of them is correctly matched—is derived by holding one of 
the items matched, which can bo chosen in t — s ways. This leaves t — s — 1 
//-items, which can be arranged against the remaining u — a — 1 jr-i terns in 
(u — a — 1 )!/(m — t) ! ways. The product of these two expressions gives us for 
the value of the first bracket 


[n(r.+0 + - • • + n(Y t )] = 


(t - s ;)!(// - a - 1)! 
(w - t)\ 


To evaluate the second bracket, we hold two of the t — a items matched, which 
can be chosen in (l — a)!/[2!(£ — .s — 2)!] ways. There remains l — a — 2 
y -items which can be arranged against the remaining u — a — 2 j-items in 
(u — s — 2) !/(a — t) ! ways. The product of those two expressions gives us 


[n(Y^Y^) + ••• + n(Y t ^Y t )] 


(t - *)!(m - a - 2)! 

2 Ht - a — 2) !(?7 — 0 r 


Continuing thus, we develop the following series for the number of arrangements 
of t items against u items such that the first a are correctly matched: 


n(i\ +1 y a+ 2 • • • Yt) = 


U - a)! 
(u - t)\ 


(t 


a)!(u — a — 1)! (t — a)!(a — a — 2)! 

‘ + 2 !(/ — a - 2)!(?7- _ 0! 


+ (— 1) f 


(t - a )!(r - t)\ 

(t — s)!(u — t )! ’ 


In order to express the number of arrangements, iV (a ), such that am/ a correct 
matchings occur, we must multiply the above series by £!/[a!(£ — a)!], which is 
the number of ways in which a items can be chosen from t items: 


AT r (u — a)! 

00 s\(t - a)! L (u - t)\ 


(t-s)l(u a - 1)! 


(u - t)\ 
+ • 


+ ( — iv 


(t — a) \(u — 
(t — s)!(u 


- t)\ " 

- OU’ 


And in order to obtain the probability that a single random arrangement will 
result in exactly a correct matchings, we must further divide by nl/(u — tl), 
which is the total number of ways in which t items can be arranged against 
u items. Calling this probability P (a) , we have then 


Pm 


t\ (u — os r ( u — s)\ 

u!s! (t — a)! L(w — 0- 


(t — a)! (u — a — 1)! 

(u - t )! 

+ ••• + (- iy- 


(t - ft]Hu 
{t — a)! (u 


t)l 





88 


DWIGHT W. CHAPMAN 


Finally, factoring ( t — s )!/ (it — t) ! out of all terms in the bracket, the series sim¬ 
plifies to 3 


Pm 


sill'. |_0!(< - s)! 


u 

1!« 


— * — D! (w — s — 2)1 

- a - 1)! + 2!(< - s - 2)1 


-+ (- 1 )'"’ 


(w - 0! 1 

(< - s)! 0!_ ‘ 


( 1 ) 


In any practical situation, the significant question is not the probability that 
exactly s correct matchings shall occur, but the probability of s or more correct 
matchings. Obviously 


(« or more) 


= P{*) + M) + * * * + Pit) • 


whence, by equation (1), 


P 


it or more) 


t\_ r (w-_s)! 

s!m! _0!(< — «)! 


(u _U - s - 2)! 

l!(f - a - I)! + 2!(/ - * - 2)! 


_t\_ Of - a -_1)1 

+ (s + 1) lit! L»!0 - s - 1)1 


, _ t\ _ r Of - a - 2 )! 

+ (s + 2)!u! l_0!(< - a - 2)! 


(»< - a - 2)1 


+ (- 1)*' * 


_(«f_r 0!" 
(t - s)!ol_ 


l!(f - s - 2)1 

+ •••+(- 1)' 


(u - t)l 
(t - s - 1)10! 


m] 




_ 0f_—0!_ 

(t - s - 2)!()!_ 


+ ... 

, tl_ f (a - f)_!l 

+ t\u ! L 0!0! J* 


( 2 ) 


Or, collecting terms in a form better suited to practical computation from tables 
of factorials and reciprocals, 


(« or more) 


= t[ Uu -surn 

u\ \ (t — s )! [_0!s'J 

0* ~ s ~ [ __ 


11 
!s! J 


(t - s - 1)' [_0!(s +1)! I Is! 

(it _ s - 2)1 T 1 

+ (t - s - 2)! |_0!(s + l 


1 


rt + 


0!(* + 2)! l!(s + 1)! 2!s! 


-Ll 

5!*!j 


3 In the special case in which the series of a-items and the series of ?/-items are of the same 
length, whence i — u , equation (1) reduces to 

p (<) = 1 r 1 _ 1 + 1 —li 

W s!(_0' 1! 2! V ’ (t — s)! J 



GENERALIZED PROBLEM OF CORRECT MATCHINGS 


89 


+ 


. (M - Olf 1 1 1 

i " 61 Lo!<! 1!(< - 1)! + 2!(< -2)! 


+ (- 1)' 


(t 


-Wi]} • 


(b) The Significance of a Given Mean Number of Correct Matchings Result¬ 
ing from n Independent Trials 

A frequent practical situation is that in which interest centers on the signifi¬ 
cance of the mean number of correct matchings achieved by a group of n indi¬ 
viduals working independently with the same two series. 

In order to determine the probability that the mean number of correct match¬ 
ings, s, resulting from n independent trials shall equal or exceed a given value, we 
are required to describe the distribution of the means of samples of size n drawn 
at random from a parent population in which the* variable' is ,s(= 0, 1, 2, • • • t) 
with relative frequencies P (0) , P { i )y P (2 >, • • • I\ o, given by equation (1). The 
tabulation of this parent distribution follows: 

Table I: Distribution of s 

s Relative frequency ( = J\ s) ) 


0 

1 

2 

t 


t! r ul (h - 1 )! (u - 2 )! (u - 3)! 
OhilLoM! 1 l(t - 1)! + 2 \(t -'2)! 3!(* - 3)1 


t\ r (u - 1 )! (u - 2 )! (i 
l!ulLo!(< - 1)! i!(t - 2)! + 2!( 

l! [ (u — 2)! (u — 3)! 

2\u\ |_0!(£ —”2)1 1 \(t - 3)1 + ’ 


t\ r (u — 1 )\ 

jUTlL 010! 


3)! 

-3)! 


+ •••+(-1 y 


(u - t)\ 
V 0!" 


-+(-!>“! 


(M - i)\ 
(i - 1) !0! 


] 

] 


+ (- D'- 2 


(u - O'- 
(t - 2) !0!_ 


We now determine the first four moments, v\, v it v 3 , and v t) of this distribution 
about the origin s = 0. Since, in general, 

t t 

v k = X) X (Relative frequency of s)] = X) «* R <»), 

s = 0 a - 0 

the tabulation for the computation of any moment is as follows: 



90 


DWIGHT W. CHAPMAN 


Table II: The Computation of the k th Moment of the Distribution of s 


1 HI r (u - 1)! _ (m- 2)! (u -3) ! 

I !m! LoK« — 1)! l!(<-2)! + 2!(<-3)! 

2*0 T (m — 2)! (it — 3)! 

2!m! |_0!(<— 2)! l'!(< - 3) 

3**1 f (w — 3)! . 

3!w! |_0!(< — 3)! ''' + K ' 


2 )! 1 !(< - 3 )! 


+ •••+(-!)' 


r_ 1)f - t > - Oi l 

k ’ (t — 1) !0! J 

-0!l 

- 2) !0! J 


-|- (-1) 1 - 3 — _^1 

+ V ' (t - 3) !0!_| 


n\\{u - on 

!!«! L 0!0! J‘ 


XT .. . 1* l‘-‘ 2 k 2*~ l t k t k ~ l 

Noting that Vi = -7T7-, = -rr, ■ • • „ = 71 -7 


*.* 1! “ 0! ’2! “ 1! ’ 0 ~ - 1)! 

in brackets by these factors, we develop Table III: 

Table III 


, and multiplying the terms 


l 8t diagonal 2 nd diagonal 3 rd diagonal 

0 0 I i 1 

0 T - 1)! - 2)! - 3)! 

1 u \ (_0!0!(< - 1)! “0!1 !(t - 2)! + 0!2!(< - 3)! 


a [2M 
k! l_ l !0! 


0 ["3* _1 (m - 
w! L2!0!(i - 


< lh diagonal 

l 


_ + (-d«-i 

^ v ' 0!(< - l)!0!j 


(t terms) 


2*~ l (w — 2)! 2 k ~ l (u - 3)! 
1!0!(< — 2)! !!!!(<-3)1 


+ ••• + ( —D‘ 


+ (-D 




2•-'(« - 1)11 „ , , , 
nrr^ioij “ - 1 u '™ 3) 

m\ (t - 2 ,oms) 


o r t k ~ i (u — on ti i -> 
‘ 7iL<r^T)io!o)J <It ' rm) 



GENERALIZED PROBLEM OF CORRECT MATCHINGS 


91 


Since each scries in brackets is one term shorter than the preceding series, the 
table forms a system of t diagonals. The sum which gives us v k may therefore 
be considered as the sum of these diagonals. 

Now, from inspection, it is evident that the general diagonal is of the form 


s th diagonal 


t\(u - s) ! a*- 1 (s - l) k ~ l 

u\(t - «)! L(* - 1) !0! (s - 2)111 


+ ••• + (-1)’- 1 


_I 

0!(s — l)lj 


t\(u — *)! 
m!(( — &•)! 


1 

(« - 1)! 


2 (-»■(* 

rM) 



But it can be shown 4 that 


Whence 


2 (-W* - >•)*-■ 

r = 0 



when k < s. 


s th diagonal = 0 when k < s . 


Therefore v k is given simply by the sum of the first k diagonals of Table III. 
Or, in general, 

t\(u — l)! r i*-n 
Vk - «!(< - 1)! LOlOlJ 

/!(?« - 2)! r 2*-- 1 l*- l 1 

+ ui(t - 2)! Li!0! 0! 1! J 

t\(u - 3)! ["3* _l 2 A_l P-n 

+ ?tl(< - 3)!L2!0! l!l! + 0!2!j 


t!(u -fc)'.r k k ~ l (k - l) ( ~ l 

+ u!(t - *)! L(* — 1)!0! (k — 2)!1! 

+ ... + <—l)* _t 0 !(i— 1!)]' (3) 

To this equation we must, of course, add the condition k ^ t. 

■ E. Netto, Lehrbuch der Combinatorik , Leipzig, 1901, 249, Formula 17. 



92 


1)WIGHT W. CHAPMAN 


Solving now for the first four moments, we have 
i 


v\ = 


V2 


=:[ 


N — 2) J ’ 


VA 


t “ 

U 


1 , 3 1 - 1 , (l ~ - 2 

1 + 3 IT- l + (T~“i)( 

1 LT'-^r, <L“ 11 (< — 2) 4- ( 1- 1)(< - 

h „ _ 1 + '(« _ 1)(„ ~ 2) + (M - 1)(H - 


If now wo define', for oonvonionoo, 


2)U —_3) ' 

2 )(u — 3)_ * 


M - 1 ’ 
t - 2 


we have, for the constants of the distribution of .<?, 

Mean = ui = a. 
fi‘l — v 2 — v\ 

— u(l + 6) — a 2 , whence <r = \/a( 1 + 6) — a 2 . 

M3 = ^3 — 3^iV 2 + (5) 

= a(l + 36 + 6c) — 3a 2 (l + 6) + 2a 3 

jU4 = V\ — Av\V* + §V\V2 — 3vJ 

= a(l + 76 -f 66c + bed) — 4a 2 (l + 36 + be) -f 6a 3 (l + 6) — 3u 4 

From these constants we can determine the skewness and kurtosis of the distri¬ 
bution of s, 

ft = --f, iind ft = -^ • (6) 

M2 M2 

Now it is known that the means of samples of size n drawn from a parent 
population with constants pi and are distributed in such a way that 



GENERALIZED PROBLEM OK CORRECT MATCHINGS 


93 


ft (means) — ' 


ft 


ft 


(menus) 


= 3 + 


- 3 


n 


(7) 


Therefore, having determined the beta-constants for the distribution of .s, we 
can determine the beta-constants of the distribution of S, the mean number of 
correct matchings resulting from n independent trials. 

Now when t = u ^ 4, we have 

a — b — c = d — 1 , 

and equations (5) give us for the distribution of s 
Mean = 1 , 

M2 = 1, 

M3 = 1 > 

M4 = 4 . 

and therefore, for the constants of the distribution of 5, we have, by equations 

(7), 

ft == - , and ft == 3 -j- 
n 


whence 


jft = 1 

1ft = 4 


which indicates a positively skc*wi‘d and leptokurtic distribution. The effect of 
increasing u and holding / constant is to increase the skewness, as shown in the 
following table for t = 5: 


t 

u 

ft 

5 

5 

1 

n 

5 

G 

1.05 

n 

5 

7 

1.16 

n 

5 

8 

1.31 

n 

5 

9 

1.46 

n 


The degrees of skewness and kurtosis met with in practical cases of matching 
with any considerable number of judges (n) are such that a Pearson Type III 
distribution curve gives a reasonably good fit to the distribution of mean num¬ 
bers of correct matchings. If, therefore, we have to determine the significance 



94 


DWIGHT W. CHAPMAN' 


of any obtained mean number of correct matchings, wc may resort to Salvosa’s 
tables 5 of the area under the Type III curve. 

As a concrete example of the application of this method let us imagine that 10 
judges have arranged 5 character sketches against 8 specimens of handwriting, 
5 of which are true apposites of the sketches. Let the total number of correct 
matchings achieved by this group be 12, whence the mean number per judge is 
1.2. We have, then, 

5 = 1.2, n = 10, 

t = 5, u = 8, whence a = - = .625, 

u 


c = -—\ = .500. 
u — 2 

We now find the mean, standard deviation, and (3t of the distribution of s, as 
follows: 

The mean of the distribution of ,s is, by sampling theory, the same as the mean 
of the distribution of s. 


Mean = a = .625 . 


The second moment of the distribution of s is, by sampling theory, - times 

n 


the second moment of the distribution of s; whence, by equation (5), 


Standard deviation = a/ [a(l + 5) — a 2 ] = .243 . 


And, by equations (5) and (7), 

1 [a(l -j- 36 + be) — 3a 2 (l + b) + 2a 3 ] 2 __ 

Pi - jg — wi + b) - « 2 p " * * 

Now the obtained mean number of correct matchings was 1.2, and the next 
lower number which could have occurred (corresponding to a total of 11 instead 
of 12 for the group of judges) is 1.1. The lower boundary of the class-interval 
whose midpoint is s = 1.2 is therefore 1.15; and it is the area above this boundary 
under the curve of s in which we are interested. 

5 L. R. Salvosa, Tables of Pearson’s Type III function, Ann. Math. Statist ., 1, 1930, 
191-198. 



GENERALIZED PROBLEM OF CORRECT MATCHINGS 


95 


The deviation of this boundary from the mean of s is 


1.15 - .625 = .525 , 


and this deviation expressed in terms of the standard deviation gives 


.525 

.243 


2.16. 


Entering Salvosa’s table for the deviation 2.16 and skewness = Vfr = *36, we 
find by interpolation that so good a performance should be expected by chance 
only about 23 times in 1000. 



MOMENTS ABOUT THE ARITHMETIC MEAN OF A BINOMIAL 
FREQUENCY DISTRIBUTION 

W. J. Kirkham, Oregon State College 

Although the most useful moments of a binomial distribution have been 
derived as a function of the parameters of the generating binomial for any 
binomial frequency series, a generalization of notation and procedure is well 
worth our consideration. The problem attempted in this paper is the calcula¬ 
tion of the moments about the mean for the general frequency series of Table I. 

TABLE I 

The generalized binomial frequency series 
x (item) / (frequency) 

N • n C>V 
N-nCip'q *- 1 
N- n C 2 p 2 (j n - 2 


0 

1 

2 


N- n CnP n q° 


In calculating the moments of a set of data about any value, it is often found 
convenient to use an arbitrary origin, define the moments about this value, and 
represent the desired moments in terms of those calculated. In the general 
binomial scries, the origin of the x’s is found to be the best arbitrary origin. 
These intermediate moments are 


V\ 


Z/* 

N 


= M y arithmetic mean; 


Vl 


Z/* 2 . 

N ’ 


( 1 ) 


v n = 


Z /*’ 

N 


where v x is the z th moment. 

The moments (g’s) about the mean are easily defined as functions of the v’s 

96 



BINOMIAL FREQUENCY DISTRIBUTION 


97 


from fundamental definitions of the ju\s. Denoting the i th moment by we 
have 


Mi 

M2 

M3 


Z/O - M) 

N 

Z/(z - MY 

N 

Z/U - MY 
N" 


= 0, 

= ^2 - A , 

= ^3 — 3^i + 2iq , 


( 2 ) 


In general, 

H n = Vn — „CVn-lJ'l + nCVn-^l + * • * + (— 1 )” Kn^n-l — l)*'? . ( 3 ) 

Or, if we let {y} n = v rlJ we may express the n th moment by a simple notation. 

Mn = {m}” = M* ~ nCl{v) n ~ X Vi + n (' 2 {v} n " 2 v\ + = ( ( V } — V\) n . (4) 

Solving the equation for {*/}, 

M = {m} + vi. 

Raising both sides to the n th power and substituting for the “brace” notation, 
v n = M» ~f* nd lMn—1^1 + »^ 2Mn—2^1 T" * * * + • 

Whence 


fin — Vn — nClfJLn-lVi — n^iUn-iA — ■ • • — V* , (5) 

a semi-recursion formula. 

The original formula for contained a moments or variables; and since there 
are only (n — 2) of the p's which are of lower order than /jL n) it is necessary to 
retain v n and v\ in (5). Since m = 0, one term in the expansion of /u n is zero. 
For instance, when n = 5, we have 

Me = v b — 5 /i 4 ^i — 10 /ig^i — 10/z 2 *q — v\ . 


To calculate /**-, it is necessary to calculate the v’s from vi to vk. For the 
binomial series, these v’s are 


v\ = 1 7ipq n ~ l + 


2(n)(n -1) 2 3(n)(n — l)(n - 2) 

+ 1-2-3 


1-2 


p 3 gn -3 


(n - l)(n - 2) 


p2qn —3 ... p 




V'l 


= np .^ p'g n 2 + 

= np(q + p)" -1 = np, 

= np £l- 2 n_l + - p'g" -2 + 3 —--— p 2 q n ~ 3 + • • • + np n ~' J, 




98 


W. J. KIIIKHAM 


V 3 = np 1 + 2 2 — p'cf‘-- + 8 2 p"q n ~’' -f ... -f n 2 p n_1 J, 

v K = npjV-V~' + 2*-> — pV~ 

, . 7l _, (n — 1 )(» — 2 ) 

+ ’ 2 ! 

In the simplified form of v k , the [ ] is the (k — l) lh moment about —1 of the 
binomial scries generated by the binomial (7 + Denoting this [ J by 

v k - l (n — 1 ), the p’s can be expressed by the formula 

v k = npv[-. x (n - 1), (6) 

where v' is a function of (n — 1 ) and (k — 1 ) while v k was a function of n and k. 

Let us sec how a v’ in v k can be defined in terms of the p's of lower order 
than k. In finding this relationship, a consideration of the two series of Table II 
will be helpful. 


p 2 q n ~ [i + 


+ tr~ l p 


k 1 qr% H-\ 




TABLE II 


x' 

f 

X 

f 

1 

iVn-lOV^ 

0 

Nn -lC(,p 0 q n l 

2 

Nn-lC&'q*^ 

1 

AT-iCW* 

n 

Ar B _iC n _,p"7° 

ft — 

jV n -lCn-lP’ ,_1 7° 


The [ ] in v k for Table I is equal to the (k — l) th moment of x f about x' = 0 . 
Or 


v k -\, Table II, x' , = v k ~\ , Table I, = v[^. l (n — 1). 


Also Vk -1 for x, Table II, is v k -i for the series generated by ( q — p) n ~ l . 

The desired relationship between the p\s for the tw T o series of Table II can 
be found by making use of the equations expressing the equality of the jjl’s for 
x and z'. Dropping the variable which shows the number of items, the same 
for the two series of Table II, in the notation, we have 


g 2 = M 2 = ^2 


2 ' '2 
*T = *2 - V 2 > 


2vi(v x — v[) + (vi - v[) 2 , 


M 3 = M3 = ^3 — Sv 2 Vl + 2 v\ =: V 3 — Sv' 2 v[ -f- 2iq * 
v' z = j/3 — Sv 2 vi + + Sv' 2 v[ — 2*q 3 . 


Substituting the value of v' 2 in the right member of v' Zi 

v z = v z — 3^ 2 (^i — v[) 4 - 3^i(^ — v[) 2 — (vi — v[) z . 





BINOMIAL FREQUENCY DISTRIBUTION 


99 


In son oral, 

v'k = V k - k (\v k -i(^ - l>[) -f A^Va-sOa - I'I) 2 +•••+(- l) A (»'l “ v[) k - ( 7 ) 

The formula just dori\od may bo used to dofino tin* moments about any 
origin in terms of those about the original zero of the .r’s. For our immediate 
use, tho formula simplifies since v[ = v k + v 0 = v x + 1. Then 

v[ = Vk + iX\vk -1 + /XV*- 2 + aIVa-3 + • • • (S) 

By simple analysis we found tho \alue of j>i to be np. By the method of 
continuation, we are able to extend the list of v’x to any number. v f from (S) 
is used in (fi) with n replaced by (n — 1) in the Fs. 

co — 1. 

v\ = np. 

u 2 = npv[(n — 1) = np[v\(n — 1) + v {) (n — 1)] 

= np[(n - 1 )p + 1] = n(n - 1 )/r + np. 
v 2 = npv 2 (n — 1) = np\v 2 {n — 1) + 2v x (n — 1) + r 0 O -- 1)J 
= n(n — 1)0 — 2 )p 3 + 3/? (a? — l)// 2 + np. 
i/ 4 = npv'a(n — \) - np[v :i (n — l) + 3 v«(n — 1) -f 3v\{n — 1) + v 0 (n — 1)] 

= np{[(n - 1 )(n - 2)0 - 3)/; 3 + 30 - 1)0 - 2)// 2 + (n - l ) p ] 

+ 3 [0 - l )("' - 2 )/r + (m - \) p ] + 3 [0 - I )/'] + 1 } • 

-- /&0 - 1)0 — 2)0 — 3)p 4 + 60)0 — 1)0 — 2)/; 3 + 70)0 — l)p 2 + np . 


If the order of the terms in the expansion is reversed, v n is an ascending power 
series in p. The pure numerical coefficients in some of these v’s are 

1 = ( 1 ) 

** = ( 1 , 1 ) 

v 3 = (1, 3, 1) 

^4 = (1, 7, 6, 1) 

* = (1, 15, 25, 10, 1) 

r 6 = (1, 31, 90, 65, 15, 1) 

v 7 = (1, 63, 301, 350, 140, 21, 1) 

r 8 = (1, 127, 966, 1701, 1050, 266, 28, 1). 




100 


W. J. KIRKHAM 


In general, 

*»+! = (l, Z «t\, t, (nC, E -1 C,V 

1 2 ' 1 ' (9) 

Using the foregoing /s, and the semi-recursion formula, we are able to deter¬ 
mine the v’s. 

fji2 = V2 — v\ 

= bip + (n)(n ~ l)p 2 ] - (rip) 2 
= wp(l - 70 
= >W- 

^3 = — 3^i/X 2 ~ vf 

= [np + 3 n(n — l)p 2 + (n)(n — l)(w - 2 )p 3 ] — 3(wp)[wp(l - p)} — [np]\ 
= np + ( — 3n)p 2 + (2 n)p 3 == np(l — 3p + 2 p 2 ) 

= #ip(l - 70(1 - 2 p) 

= wp</('/ ~ ? 0 - 

P 4 = [np + 7(n)(n — l)/r + G(n)(n — l )(n — 2)p 3 + (n)(n — l)(n — 2) 

(r - 3)p 4 ] - 4(np)(np)(l - 3 p + 2p 2 ) - 6 (np) 2 (np)(l - p) - (np ) 4 

= np + ( — 7 n + 3 n 2 )p 2 + (I2n — Gn 2 )p 3 + (— 6 n + 3 n 2 )p 4 
= np(l — Ip + 12p 2 — Gp 3 ) 4 - 3n 2 p 2 (l — 2p 4 - p 2 ) 

= np</ — 6 np 2 </ 2 3ri 2 p 2 q 2 . 

P 5 = np(l — 15p 4~ 50p 2 — GOp 3 + 24p 4 ) 4 - 10n 2 p 2 (l — 4p 4 - 5p 2 — 2p 3 ) 

= (q — p)(npq — \2nprq 2 4 * 10 ri 2 p 2 q 2 ). 

Me = np(l - 31p + 180p 2 - 390p 3 4- 3G0p 4 - 120p 5 ) 4- 5n 2 p 2 (5 - 3Gp 
4 - 83p 2 — 78p 3 4- 2 Gp 4 ) + 15n 3 p 3 (l — 3p 4- 3 p 2 — p 3 ) 

= npq — 30 np 2 q 2 (q — p ) 2 4- 25 n 2 p 2 q 2 — 130n 2 p 3 g 3 + 15n 3 p 3 {/ 3 . 
p 7 = np(l - 63p 4 - 602p 2 - 2100p 8 + 3360p 4 - 2520;/ + 720p 6 ) 

4 - n 2 p 2 (5G - G 86 p + 2590p 2 - 4270p 3 4 - 3234p 4 - 924//) 4 - n 3 p 3 ( 105 

- 525p + 945p 2 - 735p 3 4- 210p 4 ) 



HINOMlAT j FHKQl KXl’V DIST1U BI TIOX 


101 


(i q — p)(np<f — 00 np~(f 2 300////#/* + oO//-//#/- — 402/r//#/ 3 -f lOow'pV*). 
////(! - 127 p + 1932// - 10200// + 25200// - 31920// + 20100// 

- 5040//) + //“/>-*( 119 - 2304 p + 13K95// - 35700// + 40004// 

- 29232// -f 7308//) + //>■*( 490 - 3850// -f 10990// - 14770// 

+ 0520// - 2380//) + ////(105 - 420/; + 030// - 420// J- 105//) 
npq( 1 — 42//</(3 — 10/wy(l — 3//#/))) + 7//~//*-V/‘~(4 7 — 4 /;r/(77 — 201 ////)) 
+ 70////<y 3 (7 — 34 pq) -(- 105////#/. 



ON CERTAIN DISTRIBUTION FUNCTIONS WHEN THE LAW OF THE 
UNIVERSE IS POISSON’S FIRST LAW OF ERROR 1 

By Frank M. Weida 

Introduction. The median, which is that value of a permuted variable 
which has as many observed values on one side of it as on the other, appears to 
be the natural competitor of the arithmetic mean when we an* interested in the 
probable or most probable value of an unknown quantity. It is well known 2 
that the law of probability, namely, Poisson’s first law of error, which results 
from the assumption that the* median is the most probable value of tin* unknown 
quantity is given by 

z. _Ll! 

/Or) =-<?*. ( 1 ) 

or 

Little is known about the form of the distribution functions of the more 
important statistics when the law of the “Universe” is Poisson’s first law of 
error. It, therefore, appears to be of interest and importance to enlarge our 
present knowledge of distribution functions by finding certain new ones when 
the variable or variables are defined by (1). 

In this paper we present the following results: (1) We have obtained an 
explicit expression for the distribution of means of samples of n; (2) we have 
obtained an explicit expression for the distribution of differences; (3) we have 
obtained an explicit expression for the distribution of quotients; (4) we have 
obtained an explicit expression for the distribution of standard deviations for 
samples of n \ (5) we have obtained an explicit expression for the distribution of 
geometric means for samples of n; (6) we have obtained an explicit expression 
for the distribution of harmonic means for samples of w. 

In our analysis, we have made ust 1 of the theory of characteristic functions in 
the sense of Levy. 3 This theory has boon extended to more than one dimension 
by V. Romanovsky 4 and by K. K. Haviland. 5 S. Kullback, 6 in his thesis, has 
made further extensions and has applied them successfully to the distribution 
problem in statistics. 

1 Presented to the American Mathematical Society, February 23, 1933. 

2 Brunt, David: ‘‘The Combination of Observations,” 1923, p. 27. 

3 Levy, P.: “Calcul des Probability;” pp. 133-191. 

4 Romanovsky, V.: “Sur un th6oreme limite du oaloul des probability's,” Reeueil math£- 
matique de la Soci6t6 mathfrnatique de Moscow, Vol. 36, 1926, pp. 36-64. 

4 Haviland, E. K.: “On the inversion formula for Fourier-Stieltjes transforms in more 
than one dimension,” American Journal of Mathematics, Vol. 57, 1935, pp. 94-101. 

6 Kullback, S.: “An application of characteristic functions to the distribution problem 
.of statistics,” Annals of Mathematical Statistics, Vol. V, No. 4, pp. 263-307. 

•—_*TL1_ - 



ON CERTAIN DISTRIBUTION FUNCTIONS 


103 


The explicit expression for the distribution of arithmetic means of samples of 
n is not new. This law of distribution has previously been obtained otherwise 
by F. Hausdorff 7 and by A. T. Craig. 8 It is inserted here to show the superiority 
and greater power of our method when compared with previous methods and 
for the completeness of our discussion. The other results offered in this paper, 
as far as the writer knows, are new. 


1. The distribution of arithmetic means. Let us consider 


/(*) 



( — a < x < a) . 


( 2 ) 


If we assume that X\, x 2 , • • • , x n are independently distributed and that each 
x t (i = 1,2 , • • • , n) is distributed according to the same distribution law, namely, 
Poisson’s first law of error, then it is fairly easy to see that the characteristic 
function for the law of distribution of means of samples of n is given by 


<#>(<) = 





If u — 2 t x t (i = 1, 2, • • • , n), then it follows that the distribution function 
of u, namely, P(u), is given by 


which, upon simplification becomes 


PM = 


2"“' k" f“ <lt 
~iro" J-a (1 - cit )' 1 ' 


(5) 


It is readily seen that the poles of the integrand arc of the » ,h order and art 1 
those of (1 — a it)". It follows by the well known Residue Theorem of Cauchy 9 
that 


PM = 


2 n ~'k" (- 1 )"-' 1 d"-' 

' *■ (n — 1)! i" dt n ~ l 


e-‘“ _ \ 

(i + city] 



a t 


(«) 


If now, we replace u by n | x |, we will obtain the desired law of the distribu¬ 
tion of arithmetic means of samples of n which is 


p{ m) 


2 W k n ( — l ) n_l n d n ~ l / e~ ltnU{ \ 
a n i n i (n ~ 1 )! * dtT 1 \(1 + vit) n j 


(7) 


defined for all values of x on the range (— a < x < a). 


7 Hausdorff, F.: Beitrage zur Wahrscheinlichkeitsrechnung Koniglich Sachsischen 
Gesellschaft der Wissenshaften zu Leipzig. Beriehted iibcr die Verhandlungen Math - 
Phys. Classe, Vol. 53, 1901, pp. 152-178. 

8 Craig, A. T.: “On the distribution of certain statistics ,” American Journal of Mathe¬ 
matics, Vol. 54, 1932, pp. 353-366. 

• Macrobert, T. M.: “Functions of a Complex Variable, M 1933, pp. 57, 295. 



104 


frank m. weida 


A. T. Craig 8 has given the distribution laws of arithmetic means of samples of 
size 2, 3, and 4. These results as well as the results for any n are readily ob¬ 
tained from (7). 


2 . The distribution of differences. Let us assume that the laws of distribu¬ 
tion of x and y are independent and that they are given respectively by 

i _ \*\ , IvJ 

f(x) = — e ; f(y) = — e *2 ; (— a < x < a) f ( — a < y < a). 

<7l 02 

In this case, the characteristic function of the law of distribution of differ¬ 
ences (a* — y) is given by 

M 


4>(0 


b r « t /1^i — — 1 b n -iqvi 

-- 1 / e 9 i dxh / e -2 ( Uj. 
J—a < 7 % J-a 


Performing the operations indicated in (8) and simplifying, we find that 
(.) _ 4fciA : 2 1 _ 1 

O'10-2 (1 — (Tilt) (1 + V'lit) * 

It is fairly easy to see that the distribution law of u is given by 

e -uu ( n 


PM = 4U ' 2 - f° 

&TT(T\0'2 J-a 


(1 — -j" 

Now, let {(1/a i) — ?7| = v/u, then (10) becomes 


pm= -r 2k ' k - ie — r Ti+ ' a 

jrf'<ri<r 2 (<ri + cr t ) J 


(- »’) 


e~ v dv 

1 + ... 


\ C\0'2. J) 


The integral in (11) is convergent because 

er v dv 

1 + 


Lim v n 


= 0 , 


(- v) -j 

Hence, we find that 

p (M ) . _ r 

TTlCriO^Ox + 0 " 2 ) Ja 


(—"t -u) 

\ <T\CT2 / 


e~ v dv 


(-*) 


i + 


(fiinOj 

\ J) 


( 8 ) 


(9) 


( 10 ) 


01 ) 


(12) 



ON CERTAIN DISTRIBUTION FUNCTIONS 


105 


which upon simplification becomes 

4kik 2 


P(u) 


W 


<Ti<T 2 (ai + <r 2) 0 * 2 1 0\02 


0 1 + 02 


«}» 


where W n 1 < ai -~— (T - u\ is the confluent hypergeometric function.™ 
*2 l ^2 J 

It is well known that 


(13) 


w , v 6 2 V ( a -fc-i+mA / V“2* m Ji 


for all values of A: and m and for all values of 2 except negative real values. 
Clearly, 


f . 'l *1*2 2 /* a 

w , M l = £_ / c~‘ rf< 

0f 2 \ <ri<r 2 / r(l) Jo 


which, upon simplification becomes 


W 


0 , 


01 + 02 


2 L 0102 


_* 1^*2 u 


Hence, we now find that 

POO = 


I < *9 


4A*iA * 2 

-\ c 

0\0Ml + 02) 


(14) 


Oft) 


If now, we replace' u by | .r | — | y | , we will obtain the desired law of distri¬ 
bution of differences which is 


Cfl-rl-UI) = 


ikikt 


1 ||x| -M| 


+ °i) 


( 10 ) 


3. The distribution of ratios. We assume that the laws of distribution of 
x and y arc independent and that they are given respectively by 


, _ M . _ l*J 

f(x) = — e ” l ; f(y) = — e ; ( — a < x < a), (— a < y < a). 

Cl c t 


10 Whittaker, E. T. and Watson, G. N.: "A course in modern Analysis,” 1915, pp. 333- 
334. 



106 


FRANK M. WEIDA 


Let u = log | x | — log | y |. The characteristic function of the law of distri¬ 
bution of quotients is then given by 

h. f a _ [£j h f a _! 

4>(t) = — / e '»(| x |) <( dx~ / e ' 2 (| y |) -<< dy 

VlJ-a C,J-a * * ( 17 ) 

= f e "lx'‘dx f e dy. 

Jo Jo 

Now, let s = x/(j\ and w = y/a 2 , then clearly 

</>(0 = V / J e~ § s u ds J e^w^dw, 

whence 


0(0 = ikJcy^Tdt + l)r(l - it). (18) 

It follows that the distribution law of u is given by 

r « 

P(tt) = —— 2 / 6 ^<u4nlog«rp-ilog<r 2 <p(^ 1) r(l — 2*0 dt 

27T y — a 

which upon simplification, becomes 

P(u) = 2 -—- ( + 1) r(l - i<) dt. (19) 

A- y-« 


Now, let (1 — it) == — y, then (19) becomes 

/>(„)= 4 /ClA L 2 / 14 *" »i»g» 2 ) r(2 _|_ i;) r(—1>) d». (20) 

27 TZ J-i ia 

Since it can be shown that 11 

(1/27 ri) f 1 + 1 “ er-T(2 + c) r(- a) dv = F(2){1 + (1/c*))" 2 , 

J— l — t or 


we find that (20) becomes 

Pin) = r (2) (1 + «X* . (21) 

02 ( 0 ‘ 2 e M J 

Now, put e u = \ x\/\y\ = R, whence from (21) we will obtain the desired law 
of distribution of quotients which is 


TO = 


4^(711X2) 

0*2 R 



(22) 


11 Macrobert, T. M., “Functions of a Complex Variable,” 1933, pp. 114, 139, 151. 
Whittaker, E. T. and Watson, G. N., “A course in modern Analysis/ 1915, pp. 283. 



ON CERTAIN DISTRIBUTION FUNCTIONS 


107 


4. The distribution of variances and standard deviations. If we assume 
that the variance and standard deviation are calculated about a sample mean 

n —1 

and if we let u = x], and if the x x are independently distributed and each 

»—i 

Xi is distributed according to the same distribution law, namely, Poisson’s first 
law of error, then it is clear that the characteristic function for the law of 
distribution of variances of samples of n is 


-u: 


e * ax 


f: 


\n-l 


Let / represent the integral in the right-hand member of (23). We obtain 

_ L 

that (di/da) = //a 2 , whence I = Ce Making use of the conditions: 


<r —» a , 


> ttx2 dx — e A 


[rr, \Ztt 


a —> a, Ce a —> C, whence we find that 


fa „ x 1 

L c lX ~dx = e<" 


\Ar 

—- e a . 


Clearly, it follows that 


2 nl k ,li e 4 


We now find that the distribution law of u is given by 


(n — 1 ) t » n— t n 1 


P(u) = 


'2ircr’ > ~ 1 


j:~a 


Evaluating the integral in (25) with a suitably chosen contour, 11 * we find that 


n— l __ n —1 

„ O— n -3 


, 2 n ~ l k n ~ l w * e - 2 tt ^ _ 

P(u) =- 7 --c— « 2 c 


- r (^) 


Now, let m = x 2 = ns 2 , whence from (26) we will obtain the desired law 

t *=1 

of distribution of variances which is 


n — 1 n — l 


n— 3 « 3 


P(s 2 ) = --- j --- lT — n 2 (s 2 ) 2 <T*\ 

18 Macrobert, T. M., “Functions of a Complex Variable,” 1933, p. (»7. 



108 


FRANK M. WEIDA 


The law of distributions of standard deviations can be obtained at once from 
(27) since d(s 2 ) = 2s ds. 

We shall now give the specific laws of distribution of variances for samples of 
size 1, 2,3, 4, and 5 when the law of the “Universe” is Poisson's first law of error. 
From (27), 

For n = 1, 


P(s 2 ) = 0, 


For n = 2, 


P(s 2 ) 


I -L-„ 2 
2 2 ke °e 


For n — 3, 


P(s 2 ) 


4fc 2 7TC a G 


(0 < a* < oo). (28) 


(0 < S 2 < oo ) . 


(29) 


(0 < s 2 < oo ) . (30) 


For ft = 4, 


P(s 2 ) = 


32fcVe 'sc * 8 
<r a ~ 


(0 < s 2 < oo). (31) 


For ft = 5, 


/v) = 


SOfcVe 'sV * s 


(0 < «* < oo). (32) 


5. The distribution of geometric means. As before, we assume that the 
.r, are independently distributed and each x % is distributed according to the same 
distribution law, namely, Poisson’s first law of error. Then, clearly, the charac¬ 
teristic function for the law of distribution of geometric means of samples of n is 


*U) = 



Now, put s = x/o, then (33) becomes 


*(<) = 



2 H k n o’ n, {\'(it -f l)j". 


It follows at once that the distribution law of u is 


P(u) = 


2»fc» r« 

2ir J- tt 




(33) 


(34) 


(35) 



ON CERTAIN DISTRIBUTION FUNCTIONS 


100 


Now, let it + 1 = — v, then (35) becomes 

/ -l+ia 

e v(u+nlog<r) {r(_t;) }*dv. 

1—7 a 


_On bn 

P{u) = ■ e M + nloK<r 


2iri 

It is well known that (10) 


[«-,))• - . 

sin w ttv{T(v -f 1) } w 


Using (37) in (36), we readily find that 


_On bn 

P(u ) = ■ e u+wloKJ 


27Ti 


{F(v + l)) n sin w 7rt> 


dv. 


(36) 


(37) 


(38) 


It is fairly easy to see that the poles of the integrand in (38) are the poles of 
{r( — v)} M and that these poles are of the n ih order. Applying the well known 
Residue Theorem of Cauchy (8), we find that 


P(u ) = 2 n k n e u+nXoga 


S (_l)n-W»a+1 j rfn-l |” gr(u+n log a) “j| 

(n- D! I dv^ Lmr+ l)!^/^/ (39) 

Now, since u = log | | + log | x 2 1 + • • • + log | x n |, then clearly, the dis¬ 

tribution law of the geometric mean, G, is obtained from the law of distribution 
for u by means of the transformation 

u = log (G ) n . 

Hence, from (39), we find the desired law of distribution of geometric means 
of samples of n which is 

pig i = 2nkn(Jnan y r .1\ (40) 

r(«) H ] W-'L|1>+D ) KW> 


6. The distribution of harmonic means. Let us assume that f(x) is the 
law of distribution for\r. It is well known 13 that the law of distribution of 
x ' = \/x is given by 

F(x') = (1/x' 2 ) f(l/x') 

if 1/x is continuous on the range of definition of f(x). Now, in case f(x) is 
Poisson’s first law of error, we find that 

lc 

Fix’) = F(l/x) = -x*e ' ; (-a S x < 0), (0 < :r ^ a) . (41) 

a 

13 Dodd, E. L., “The frequency law of a function of one variable,” Bulletin of the Amer¬ 
ican Mathematical Society, Vol. 31, 1925, p. 28; “The frequency law of a function of vari¬ 
ables with given frequency laws,” Annals of Mathematics, Second Series, Vol. 27, 1925-20, 
p. 18. 



110 


FRANK M. WEIDA 


We assume that the x[ are independently distributed and each x\ is dis- 
distributed according to the same law of distribution, whence we find that the 
characteristic function for the law of distribution of harmonic means of samples 
of n is 


4(0 


- a : 


x t dx 




from which, after simplification, we find that 

k n 2 n a 2n 


4(0 = 


(1 — ait) Zn * 

We now find that the law of distribution for u is 


«/ , 2 n k n a 2n f a e~ itu 

p( ”> - S- i-.Trr^F*. 


which, after evaluation and simplification, becomes 


(42) 


(43) 


POO = 


2 n k n 

<r'*r(3n) 


u_ 

u 3n ~ l e *. 


(44) 


Recalling that in this case, u = 1/| X\ | + 1/| x 2 | + • • • + 1/1 x n |, we make 
the transformation u = n/H, where If is the harmonic mean; whence, from 
(44), we find that the desired law of distribution of harmonic means of samples 
of n is given by 


run = 


2 n h n 


■ Tp-* n e 


« 

Vh m 


(45) 


7. Conclusions. We have shown that the same analysis is applicable to find 
the explicit expression for all the distribution laws we have discussed in this 
paper. 

The George Washington University, 

Washington, D. C. 



ON THE PROBLEM OF CONFIDENCE INTERVALS 

By J. Neyman 

When discussing my paper read before the Royal Statistical Society on 19th 
June, 1934, Professor Fisher said that the extension of his work concerning the 
fiducial argument to the case of discontinuous distributions, as presented in 
my paper, has been reached at a great expense: that instead of exact probability 
statements we get only statements in the form of inequalities. 

This remark raises the question whether the disadvantage of the solution 
which he mentioned (the inequalities instead of equalities) results from the un¬ 
satisfactory method of approach, or whether it is connected with the nature of 
the problem itself. 

I think that the problem is of considerable general interest. For instance it 
may be asked whether the confidence intervals for the binomial distribution 
recently published by E. S. Pearson and C. , 1 . Clopper , 1 which correspond to 
the probability statements in inequalities, could be bettered. 

The purpose of the present note is to show, ( 1 ) that in some exceptional cases 
the exact probability solution of the problem exists and that then it may easily 
be found by the method described in Note I of my paper ; 2 ( 2 ) that in the general 
case of discontinuous distribution exact probability statements in the problem 
of confidence intervals are impossible. 

In particular it will be seen that exact probability statements are impossible 
in the case of the binomial distribution and so that the system of confidence 
intervals published by Clopper and Pearson could not be bettered. 

In order to avoid any possible misunderstanding I shall start by restating 
the' problem. 

We shall consider a random discontinuous variate x, capable of having one 
or another of a finite, or at most denumerable se't of values 

XI, X2j * * * Xn, .(1) 

We shall assume that the frequency function, say 7 ? (x | 0 ), of x depends upon one 
parameter 0 , the value of which is unknown. The problem of confidence in¬ 
tervals consists in ascribing to every possible value of x e.g. to x n , (n = 1 , 2 , - • •) 
a “confidence interval,” say 0 i(n) to 62 ( 11 ) such that the probability, P, of our 
being correct in stating 

e^n) ^ 6 ^ e 2 in) .( 2 ) 

whenever we observe x — x n (n = 1 , 2 , • • •)> is cither: 

l E. S. Pearson and C. J. Clopper: The Use of Confidence or Fiducial Limits in the 
Case of the Binomial. Biometrika Vol. XXVI, pp. 404-413 

* J. R. S. S. Vol. 97, p. 589. 


Ill 





112 


J. NEYMAN 


(a) equal to a given value a < 1 chosen in advance, or 

(b) at least equal to this value a. 

I proposed to call this chosen value a the confidence coefficient. 

In the earlier paper I showed that the solution of the problem in its form (b) 
is always possible and easy to find. If the variate x is continuous, then the 
solution of the problem (a) is equally easy. At present we shall consider whether 
and under what conditions the solution (a) is possible when the variate x is 
discontinuous. 

Suppose that the variate x is discontinuous as described above, and that the 
solution of the problem in its form (a) exists and is given by the system of 
confidence intervals (6 i(x n ), for n = 1,2,- - -. 

The position is illustrated in the diagram below. On the axis of abscissae 
the possible values of the variate x are marked. The axis of ordinates is the 
axis of 0. The confidence intervals are marked on verticals passing through 
corresponding values of x. 


DIAGRAM REPRESENTING 

THE 

CONFIDENCE INTERVALS. 


• MARKS A POINT 

BELONGING 




TO THE 

SET OF ACCEPTANCE x(©) 


e, (n) 





e.{4) 

»,(*) 




e.( 2 ) 

e,0) 



e, (n) 


e.(i) 


0,(4) 



L « 





8.(5) 

L 



«,(2) 

8,(3) 





e, ( 1 ) 






x, x 2 x 3 x 4 x 5 x n 


According to our hypothesis the intervals (0i(x n ), 0 2 (£n)) are so chosen that 

P = «.(3) 

P is the probability of an event, say E , which we shall describe in some detail. 
Let us denote generally the probability of any event a by P{a}. P{a \b} will 
denote the probability of an event, a, calculated under the assumption that 
another event, b, has already occurred. 

Now 

P = P{E] = the probability that {either (x = Xi) and then 0i(l) g 0 ^ 0 2 (1) 






ON PROBLEM OF CONFIDENCE INTERVALS 


113 


or (x = x 2 ) and then 0i(2) ^ 0 g 0 2 (2) 


or (x = jr n ) “ “ 6,(n) g 6 ^ 0 2 (n) 


= P{x = *i}P{0i(l) ^ e ^ 0,(1) I (X = X,)} 

+ P{x = x 2 }P{0 1 (2) g 0 g 0,(2) I (x = * 2 )} 

+ . 

oo 

= X •P( a: = x„}P{0i(n) ^ 9 5 0,(n) | {x — x n ) } = a.(4) 

n*“ 1 

The calculation of the probability P in the above form is not convenient, as 
both multipliers in each term of the sum in (4) depend upon the unknown 
probability function a priori of 0. Therefore we shall present P in another 
form, giving to the event E a geometrical interpretation. Let us denote by 
CB the set of all confidence intervals (0i(/t), 0 2 (n)), as marked on the plane of 
x and 0. Thus CB will be composed of points with co-ordinates x and 0, where 


x = x n 

n = 1, 2, 

0i (n) ^ 0 g 0 2 (w) 


(5) 


The set CB will be called the confidence belt. 

Denote by A any point of the plane of x and 0, having any values for its 
co-ordinates. 

It is easily seen that the event, which we denote by E y and the probability 
of which is P = a, consists in the point A belonging to the confidence belt CB. 
In fact the event E occurs if and only if the co-ordinates of A fulfil the condi¬ 
tions (5). But just these conditions define the points belonging to CB. 

The above circumstance allows us to calculate P by means of a formula which 
discloses its connection with p(x | 0). 

Fix any possible value of 0 = 0' and draw the straight line LL the points 
of which have just this fixed value 0' for their ordinates. The line LL will cut 
some of the confidence intervals. Denote by A(0') the set of points of inter¬ 
section, and by the unknown frequency function of 0. The set A"(0) will 
be called the set of acceptance corresponding to the specified value of 0. 

The function <f>(6) may be continuous or not. So may be p(x ( 0) considered 
as a function of 0. These cases may be treated together if we agree that 22 F(d) 

B 

will denote either the sum or the integral of F(d) extending over all values of 0, 
whenever F{0) is integrable. 








114 


J. neyman 


Using this notation wo may write 


P = P\E} 


zUo) E (?(*!*)) 

6 l X( 6 ) 


( 6 ) 


where denotes the summation over all values of x belonging to X(0). 

X{ 6 ) 

From the formula ( 6 ) may be deduced the following important proposition. 
The probability P may possess a constant value a , independent of the properties 
of the unknown function #( 0 ), if and only if for each 0 


E (p(* I 0)) = a. 


(7) 


The condition (7) is obviously sufficient to have P = a. In fact, if it is satisfied, 
then we should get from ( 6 ) 

P = a X) = a.(8) 

0 


since ^Lt (<t>(6)) = 1 whatever the frequency distribution of 0 . It is equally 
easy to see that the condition (7) is necessary for having P = a whatever the 


function <£( 0 ). For suppose that for 0 

— 0 i we have 


E (p(* 1 

x(0i) 

s 

U 

so¬ 

il 

<3? 

.(9) 

Then if it happens, that 



= 1 

© 

»“S 

11 

( 10 ) 

and 



05 

II 

o 

for d 5 * 0 , 

( 11 ) 

the only term in the sum which 

is different from zero 

will be that corre- 

sponding to 0 = 0 i and the formula ( 6 ) will reduce to 


p = E (p(* 1 

X(0i) 

0 i)) = 0 9^ a . 

( 12 ) 


The original question, whether the solution of the form (a) is possible when 
the variate x is discontinuous is thus put in the following form: is it possible 
to define for every possible value of 0 a set of acceptance A"( 0 ) such that the 
equation (7) holds good? 

The answer is: in some cases it may be possible, but this depends upon the 
nature of the function p(x | 0 ). It is very easy to invent functions p(x | 0 ) for 
which the equation (7) for a definite value of a holds good, and we may even 
fix in advance the sets of acceptance X( 0 ). However the important question 
is not whether there may exist elaborately invented cases of discontinuous 
distributions where the solution (a) exists, but rather whether this solution 
exists always, or at least whether it exists frequently and in cases which are 
practically important. 







ON PROBLEM OF CONFIDENCE INTERVALS 


115 


This question must be answered in the negative on the basis of the following 
example concerning the most important of the discontinuous distributions, the 
Binomial. 

In fact it will be seen below that if x is a variate following the binomial fre¬ 
quency law, then whatever the arrangement of the sets of acceptance X(6 ), 
corresponding to different values of 0, the left hand side of the equation ( 7 ) 
cannot be constantly equal to the confidence coefficient a < 1. It will follow 
that in the case of the binomial distribution, the solution of the problem ( a) 
is impossible. 

To prove this we shall consider the variate, x, following the binomial frequency 
law. That is to say we shall assume that x may have values 0, 1, 2, . . . n ; 
and that 

pix 1 e) = z !( ^by! 01(1 - eyn - x) (13 > 

while 0 < 0 < 1. Since the set of possible values which x may have is finite, there¬ 
fore the set of all confidence intervals must be finite also. It follows that there 
is possible only a finite number of sets of acceptance A r (0). Therefore there 
must be at least one set of acceptance, say X°, which will be common to an 
infinite number of values of 0, say 9 h 02 , • • • 0 n , • • • so that for each it will 
be X(0 n ) = X°. 

Now 

X (p(a- | On)) .(14) 

*<*»> 

for all these values of 0 = 0 n will be the same polynomial in 0 of the order n. 
If it has the same value a. for a number of values of 0 exceeding n, it means that 
this polynomial is an absolute constant. Therefore if it were possible to give 
a solution of the type (a) in the case of the binomial distribution, it would be 
possible to construct a sum (14), the terms of which are all different and have 
the form (13), and such that after all possible reductions and simplifications 
all terms involving 0 would cancel and we should Ik* left only with one constant 
term a<l. This, however, is impossible, since the only term of the form (13) 
which involves a constant, is the term corresponding to x = 0 

p(0 I 0) = (1 - 0) n = 1 - no + O 2 .(15) 

£ 

and then this constant is 1. Other terms of the form (13) involve 0 X as a multi¬ 
plier. Therefore there exists only one sum of the form (14) which is an absolute 
constant, but this includes all the terms (13) 

t I *)) = i.( lg ) 

x - 0 

and thus is of no value. It follows that whatever the sets of acceptance X(d) 






116 


J. NEYMAN 


the corresponding sum (14) will have values varying with the value of 6 and 
hence the solution of the type (a) in the case of the binomial does not exist. 

This, I think, gives the solution of the question raised by Professor Fisher. 
It is clear also that whenever the solution of the type (a) exists, it may be 
found by a suitable choice of sets of acceptance, and thus by the method ex¬ 
plained in my earlier paper. 

I should like now to raise another question. Past experience shows that the 
general problem of estimation may be formulated in different ways. The form 
of this problem as it appears in Bayes theorem, required for its solution the 
knowledge of the probabilities a 'priori. 

The form of the same problem treated by R. A. Fisher in his theory of esti¬ 
mation was solved in terms of a new conception, that of likelihood. 

The problem of estimation in its form of confidence intervals stands entirely 
within the bounds of the theory of probability, without involving any concep¬ 
tion not already inherent in this theory. In the ease of continuous distribution 
the problem also allows the solution (a) entirely independent of the probabilities 
a priori. Now it is shown that the necessity of the solution (b) is bound up 
with the nature of the problem if the distributions are discontinuous. 

My question is: is it possible to formulate the problem of estimation in a 
fourth form, leading to a solution which (1) stands entirely on the grounds of 
the classical theory of probability, and (2) is not depending upon the probabili¬ 
ties a priori —whatever the conditions of the problem? 



ANALYSIS OF VARIANCE CONSIDERED AS AN APPLICATION OF 
SIMPLE ERROR THEORY 

By Walter A. Hendricks 

The need for an elementary presentation of the methods of analysis of vari¬ 
ance has been recognized by many investigators in various fields of research. 
A recent monograph by Snedecor (1934) is undoubtedly the most comprehensive 
attempt to satisfy this need which has appeared in the literature relating to 
the subject. Snedecor’s treatment of the subject consists largely of the presen¬ 
tation of a number of standard types of problems to which the methods of 
analysis of variance are applicable, directions for performing the necessary com¬ 
putations, and a discussion of the conclusions which may be drawn from the 
data on the basis of the analysis. 

In the opinion of the author of this paper, an elementary presentation of some 
of the theoretical considerations upon which the methods of analysis of variance 
are based would also be of some value. The methods of analysis of variance, 
as given by Fisher (1932), are presented as a natural consequence of intraclass 
correlation theory. However, the essential concepts may be presented in a 
more comprehensible form by the use of simple error theory. 

It seems appropriate to begin such a presentation with a definition of variance. 
If we have an infinite number of measurements of the same quantity, the 
variance of a single measurement is defined as the arithmetic mean of the 
squares of the errors of those measurements. In actual practice, an infinite 
number of measurements can never be obtained. We have instead a sample 
of n measurements, Xi f x 2 , • • • x n , from which the variance of a single measure¬ 
ment may be estimated. By referring to any text on the method of least 
squares, it may be verified that the best estimate, S 2 , of the variance of a single 
measurement which can be obtained from a sample of n measurements is given 
by the equation: 

n 

S* - - —, J) (* 4 - m)*.(1) 

in which m represents the arithmetic mean of the n measurements. The 
quantity, n — 1, in the terminology of analysis of variance, is designated as 
the number of degrees of freedom available for estimating S 2 . 

It is often necessary to estimate S 2 from a number of different samples of 
measurements. In such cases, the best estimate of S 2 is obtained by calculating 
the weighted mean of the variances estimated from the individual samples, each 
variance being weighted by the number of degrees of freedom which w T cre avail- 

117 




118 


WALTER A. HENDRICKS 


able for its estimation. The number of degrees of freedom upon which such an 
estimate of S 2 is based is given by the sum of these weights. Such an estimate 
of the variance of a single measurement is often designated as the variance 
“within samples.” 

In one of the simpler applications of analysis of variance, a number of samples 
of measurements are available, and the investigator is required to determine 
whether the magnitude of the quantity measured varied from sample to sample 
or whether all of the measurements may be regarded as having been made upon 
a quantity of the same magnitude. 

An estimate, S 2 f of the variance within samples may be obtained. Since S 2 
is an estimate of the variance of a single measurement, the variance, S 2 , of the 
arithmetic mean, m x , of the measurements in any one sample is given by the 
equation: 


S* = 


s 2 

n x 


( 2 ) 


in which n* represents the number of measurements in the sample. Let there 
be r samples. Then another estimate, S' 2 , of the variance of the mean, m i} 
may be obtained from the observed distribution of the means, r%, m 2 , • * • m r , 
by the use of the formula for calculating the variance of a weighted observation 
as given in texts on the method of least squares: 


S [ 2 = -j— 

n t (r - 1) 


m) 2 + w 2 (ra 2 — m) 2 + • • • + n r (m r — m) 2 ] .... (3) 


in which: 


m 


n i mi + viz m 2 + • • • + n r m r 
rii + n 2 + • • • + n r 


(4) 


Equations (2) and (3) yield two estimates of the variance of the mean, m,. 
It is apparent that these two estimates will be equal, within the limits of sam¬ 
pling fluctuations, if all of the measurements in the r samples were made upon 
a quantity of the same magnitude. If the magnitude of the quantity measured 
varied from sample to sample, S' 2 will be greater than S However, in actual 
practice, the two estimates of the variance of a particular mean are not com¬ 
pared directly. An equivalent comparison is made between two estimates of 
the variance of a single measurement. The first of these is nothing more than 
the variance within samples discussed earlier in this paper. The second esti¬ 
mate, which may be designated by S' 2 , is the value which would have to be 
substituted for S 2 in equation (2) in order to make S 2 t equal to the value given 
for S[ 2 by equation (3). It is quite apparent that S' 2 may be found by the 
use of the equation: 


S' 2 = 


1 


r — 


1 


[ni(mi — ra) 2 + ri 2 (m 2 — ra) 2 + - • • + n r (m r — m) 2 ].(5) 






ANALYSIS OF VARIANCE 


119 


S' 2 is often designated as the variance “between samples.” A comparison of 
S ' 2 with S 2 is obviously equivalent to a comparison of S[ 2 with S 2 % . 

If S ' 2 is greater than S 2 , a statistic, z , may be calculated: 

2 = \ l0K ‘ W . ( 6 ) 

This statistic serves as a useful comparison between S ' 2 and S 2 since its sampling 
distribution is known if all of the measurements comprising the data under 
investigation were made upon a quantity of the same magnitude. The distri¬ 
bution of z, under these conditions, is given by an equation of the form: 

fee” 1 * 

d/ - ( n i e u + n,) »W d ‘ . <7) 

in which tti represents the number of degrees of freedom available for estimating 
S' 2 , and n 2 represents the number of degrees of freedom available for estimating 
>S 2 . It is apparent from equation (5) that r — 1 degrees of freedom are avail¬ 
able for the estimation of S ' 2 in the particular problem under discussion. 

When any estimate of the variance of a single measurement is multiplied by 
the number of degrees of freedom available for making that estimate, the re¬ 
sulting product is known as a “sum of squares.” The additive property of 
the sums of squares and the degrees of freedom contributes much to the elegance 
of the scheme of analysis just presented and is of considerable practical impor¬ 
tance in problems of a type to be discussed later in this paper. In the case 
of the problem discussed above, the additive property of the sums of squares 
provides that the sum of the “sum of squares between samples” and the “sum 
of squares within samples” is equal to the sum of the squares of the deviations 
of all of the measurements from their arithmetic mean. The additive property 
of the degrees of freedom provides that the sum of the “degrees of freedom 
between samples” and the “degrees of freedom within samples” is equal to the 
“total degrees of freedom” which is nothing more than the total number of 
measurements diminished by unity. 

The methods of analysis presented above may be applied to any study of the 
effects of a number of experimental treatments of the same kind upon the 
magnitude of a measurable quantity. If experimental treatments of more 
than one kind are imposed simultaneously, the effects of each may be studied 
by modifications of those methods. The discussion of those modifications, 
about to be presented in this paper, is limited to data which may be classified 
in an “r X s” table, i.e., to studies of the effects of only two kinds of experi¬ 
mental treatments. More complex problems may be treated by simple ex¬ 
tensions of the methods presented. 

Consider an “/\ X s” table composed of rs cells, each of which contains a 
number of measurements of some quantity. The magnitude of the quantity 
measured may vary from cell to cell, but the essential conditions under which 
the measurements were made must be the same for all cells. It is also under- 





120 


WALTER A. HENDRICKS 


stood that no cell may be empty. Table 1 is an example of such a table. The 
individual measurements have not been represented. Only the number of 
measurements, n t; , in each cell and the arithmetic mean, m,;, of those meas¬ 
urements have been indicated. The arguments, a»-, represent r experimental 
treatments of one kind, and the arguments, b } , represent s experimental treat¬ 
ments of another kind. The problem to be solved is to ascertain whether or 
not the differences among the experimental treatments of each kind had any 
effect on the magnitude of the quantity measured. 


TABLE 1 

Example of an “r X s” Table Showing Only the Number of Measurements in 
Each Cell and the Arithmetic Mean of Those Measurements 


ai 


a 2 


a s 


by 

h 

b 3 

by 

b, 

mu 

m u 

m u 

m u 


m u 

nn 

n n 

ni 3 

n 14 


n u 

m 2i 

m 22 

m 23 

m 2i 


m 2s 

n<n 

n 22 

n 23 

n 2 \ 


n 2n 

w»si 

m 32 

m 33 

m u 


m Zi 

n n 

n 32 

n 33 

n 3 4 


n 3s 


m r \ 

m r2 

m, 3 

m r \ 


m ra 

n r i 

n r2 

n r3 

n r4 


n ra 


If each cell contains the same number of measurements, the effects of the 
experimental treatments indicated by the arguments, a xy may be studied by 
comparing the variance “between rows” with the variance “within cells.” The 
variance between rows may be calculated by regarding the r rows as r samples 
of measurements and applying an equation of the same form as equation (5). 
The variance within cells may be obtained by calculating the variance of a 
single measurement from the data in each cell separately and taking the mean 
of the resulting values. The effects of the experimental treatments indicated 
by the arguments, b n may be studied by comparing the variance “between 
columns” with the variance “within cells.” 

If the degrees of freedom between rows, between columns, and within cells 
are added, the sum will be less than the total number of degrees of freedom 
in the table. If the corresponding sums of squares are added, the sum is likely 
to be less than the total sum of squares. The differences are due to what is 
customarily designated as “interaction between rows and columns.” The 










ANALYSIS OF VARIANCE 


121 


more descriptive term, “differential response,” is sometimes used to designate 
the same factor. The nature ol this factor may he investigated by considering 
the effects of the experimental treatments, b Jy in each row of Table 1. 

The data in each cell of Table 1 may be regarded as a sample of measure¬ 
ments. Therefore, the data in any row may be regarded as a set of s samples 
of measurements. By applying an equation of the same form as equation (5) 
to the data in any row, an estimate of the variance of a single measurement is 
obtained from the observed distribution of the means of the cells in that row. 
By calculating the arithmetic mean of the estimates for the r rows, an estimate 
of the variance of a single measurement is obtained from r(s — 1) degrees of 
freedom. This estimate may be designated as the variance “between cells in 
the same row.” 

The variance between cells in the same row measures the average effect of 
differences among the experimental treatments, b Jy in individual rows. The 
variance between columns, which was discussed earlier in this paper, is calcu¬ 
lated from s — l degrees of freedom and measures the effect of differences 
among the treatments, b n on the assumption that the effect of any one treat¬ 
ment upon the magnitude of the quantity measured was constant for every row. 
The number of degrees of freedom assignable to differential response of the 
various rows to the treatments, b jy is r(,s — 1) — (s — 1) or (r — 1) (s — 1). 
The sum of squares due to differential response is given by the difference be¬ 
tween the sum of squares between cells in the same row and the sum of squares 
between columns. These relations follow from the additive property of degrees 
of freedom and sums of squares. 

It may be observed that precisely the same results would be obtained by 
considering the effects of the treatments, a %) in the various columns of Table 1. 
The degrees of freedom and sum of squares due to differential response of the 
various columns to the treatments, a tJ would be exactly equal to the correspond¬ 
ing values obtained for the differential response of the various rows to the 
treatments, b y . 

Up to this point the discussion has been concerned only with the special case 
in which each cell of Table 1 contains the same number of measurements. As 
a matter of fact, the methods given for the analysis of such data will yield 
correct results when applied to any “r X «” table in which the numbers of 
measurements in the cells in every row are proportional to the corresponding 
marginal totals for the columns, and the numbers of measurements in the cells 
in every column are proportional to the corresponding marginal totals for the 
rows. 

When the numbers of measurements in the various cells do not satisfy the 
above condition of proportionality, the distributions of the means of the rows 
and columns may be distorted, and, consequently, the methods of analysis 
described above may yield incorrect results. Efficient methods of analyzing 
such data have been presented by Yates (1933). A comprehensive discussion 
of these methods is considerably beyond the scope of this paper. One method, 



122 


WALTER A. HENDRICKS 


described very briefly by Yates (1933) and designated as the “method of 
weighted squares of means,” appealed to the author as being particularly 
valuable for practical work. No detailed discussion of the method seems to 
be available in the literature. Therefore, the following presentation may be 
of some interest. 

Consider the experimental treatments represented by the arguments, a», in 
Table 1. It is necessary to find an average value for the magnitude of the 
quantity measured for each row of Table 1. However, this average must be 
of such a type that its value will not be distorted by the unequal numbers of 
measurements in the various cells. The unweighted arithmetic mean of the 
means of the cells in the row seems to be the logical average to use since, within 
the limits of sampling fluctuations, the value of this average will be identical 
with the value which would have been obtained if each cell had contained the 
same number of measurements. The averages for the r rows are: 


m a 


1 

s 


(mil + m 2 + • • • + m u ) 


m a 


— — (m 2 l -f- 77122 "t" * ’ * ~t" 771 2a ) 


m ar 


1 

s 


(m r 1 + 771 r 2 + • ■ • + *0. 


( 8 ) 


By the law of propagation of error, the variance of any one of these unweighted 
means is given by the equation: 


Si 


= (Si + Si + 


+ si) 


(9) 


in which 81 is the variance of m ai , and 8 2 l21 • • •, S 2 tft arc the variances of 
m»i, m t2 , * • •, m ta , respectively. If <S 2 represents the variance of a single meas¬ 
urement, equation (9) may be written in the form: 


ss. = (- 1 - 

VM 


+ n.; + 


+ 


l\s 

n x J s- 


( 10 ) 


The value of aS" 2 may be estimated from the individual measurements in the 
various cells. S 2 is nothing more than the variance within cells, as customarily 
calculated, and may be estimated from the N — rs degrees of freedom within 
cells, in which N represents the total number of measurements in Table 1. 

The variance of a single measurement may also be estimated from the observed 
distribution of the means of the type, m at < These means are not of equal weight. 
Therefore, in order to find the variance of any one of them, it is first necessary 
to calculate the weighted mean of the r individual means. Since the weight of 
an arithmetic mean is inversely proportional to its variance, it is evident from 






ANALYSIS OF VARIANCE 


123 


an inspection of equation (10) that the weight, p at , of a mean, m a%y may be 
found from the equation: 


Va x 


= — + — + 
W t 2 


+ 


_1_ 

n ta 


The weighted mean, m a , may then be found: 


(ID 


Pa 2 ^a 2 + * * * ~\~ />a r Wl rtr 
Pa, + Pa 2 + * ‘ ‘ + Par 


( 12 ) 


The variance 8a *, of any mean, m ai) as estimated from the observed distribution 
of means of this type, is given by: 


O' 2 
o« t = 


i 


Pa,(r - 1 ) 


[p ai (m ni — ma) 1 + pa,(m at — »»„)* + • • • 

+ PaXm a 


»»«)* 


• (13) 


By substituting Si 2 , for S«„ and Si for S 2 , in equation (10) and solving the 
resulting equation for <S 2 , an estimate, »S„, of the variance of a single measure¬ 
ment is obtained from the observed distribution of means of the type, It 

is evident that, after making the indicated substitutions, equation (10) reduces 
to the form: 


Si = —-- [p ai (™a> - + PaXma, - Wl„) 2 + • • • + p„ r (m„ r - M„) ! ]-(14) 

r — 1 


It is interesting to observe that, if the numbers of measurements in the re¬ 
spective cells were equal, equation (14) would reduce to the formula for calcu¬ 
lating the variance “between rows” as customarily applied in analysis of 
variance. 

The two estimates, S 2 and 8 *, of the variance of a single measurement may 
be compared in the usual manner by taking one-half of the natural logarithm 
of the ratio of the larger estimate to the smaller and making use of the tables 
of the values of “z” given by Fisher (1932). When using these tables, it is 
important to remember that 8l was estimated from r — 1 degrees of freedom. 

The method of analysis just described may be employed to study the effects 
of differences among the experimental treatments indicated by the arguments, 
b n on the magnitude of the quantity measured. The unweighted means for 
the $ columns are: 

7Yl by = — (will + Wl21 -)"•••+ Wiri) 

r 


rrib t = - (wii 2 + w ?22 + • • • + wi r2 ) 

r 


m bg = - (m u + mu + 
r 


+ m ra ) 


(15) 







124 


WALTER A. HENDRICKS 


The weight, pbj, of a mean of the type, m bj , may be found from the relation: 

i = l + l +...+1 . a 

U\j 7l2j Tlrj 

A weighted mean, m b , may be calculated: 


Mb = + ' ‘ * + PbS to* ( 17 ) 

Vbx + V* + • * * + Vb a 

An estimate, S\, of the variance of a single measurement may be obtained from 
the observed distribution of means of the type, m bj) by the use of the equation: 


[pbSmx - m h )* + PbSwb, - m b y + ■ • • + p ba (m bg - m b ) 2 ]. 


S 2 b may be compared with S 2 in the usual manner. 

If it is necessary to study the “interaction between rows and columns,” the 
effects of the experimental treatments, b Jy may be studied for each individual 
row of Table 1. Consider the distribution of the means of the cells in a row 
designated by the argument, a % . The weight of any one of these means is 
equal to the number of measurements in the cell. A weighted mean, m* , of 
the 5 means of cells in the row may be calculated: 

™./ _ 7 l , ' 7r b' + wgmg + - ■ • + n ia m X s 

“ n>i + ri*+ ••• +fiu : .^ ' 

The variance, S[ 2 , of the mean, m 1if for any cell in the given row, as estimated 
from the observed distribution of means of this type, may be obtained from the 
equation: 

S'i 2 , = — T -— [n,i(m a - ml,) 2 + «, 2 (m, 2 - ml,) 2 + • • • 

— -U 

+ n t8 (m ls — ra' t ) 2 ]. (20) 

The variance, S 2 ,, of the same mean, as estimated from the distribution of the 
individual measurements in the cell, may be obtained from the equation: 

8\, = —.(21) 


By substituting S[ 2 for S 2 tJ , and aS 2 i& for S 2 , in equation (21) and solving the 
resulting equation for S 2 aib , an estimate, Sl lb} of the variance of a single meas¬ 
urement is obtained from the observed distribution of the means of the cells 
in the given row. After making the indicated substitutions, equation (21) 
reduces to the form: 

S 2 atb = — [n x i(mn — ma t ) 2 + ?i l2 (w,2 - m«,) 2 + - • • 

S X 

+ n lt (nu, — ml,) 2 ].(22) 










ANALYSIS OF VARIANCE 


125 


Such an estimate, Sl th , of the variance of a single measurement maj r be 
obtained for each of the r rows in Table 1. By calculating the average, Sl hJ 
of the variances of the type, S 2 Hb , an estimate, S 2 lbf of the variance of a single 
measurement may be obtained from the r($ — 1) degrees of freedom between 
cells in the same row: 


'a b 


= ZTf) 2 - m <‘X' + n '2^ Wi ' 2 - m 'af + ■■■ 


+ n,«(w„ — TO„ t ) 2 ] 


.(23) 


Equation (23) is identical with the formula for calculating the variance between 
cells in the same row as ordinarily applied in analysis of variance. This result 
is a direct consequence of the fact that the unequal numbers of measurements 
in the various cells had no distorting effect on the arithmetic means for indi¬ 
vidual cells. 

The presence or absence of interaction may be verified by comparing Sl b 
with 8' b . In general, the actual variance due to interaction can not be obtained 
by the “weighted squares of means” method, for the various sums of squares 
do not possess the additive property when the analysis is made in this way. 
However, the comparison suggested above will yield sufficient information for 
most practical purposes. 

For the special case in which r or s is equal to 2, the actual variance due to 
interaction may be calculated. Suppose r = 2 in Table l. The following 
method, suggested by Yates (1933), yields an estimate of the variance due to 
interaction from a consideration of the differences, d n between the means of 
the two cells in each column: 

di = m n — ni 2 \ 


d‘2 = Tfll 2 — W&22 


d s = m u — rn 28 


(24) 


The variance, S 2 dj , of any difference, d n is given by the equation: 



The weight, of the difference, d n is given by the equation: 


2 
V / 



ftiy rhj 


(26) 


The variance of the difference, d n as estimated from the observed distribution 
of differences, is given by the equation: 

S' d 2 = —- -r [j)i{d\ — d) 2 + P 2 W 2 — d) 2 + • • • + Ps(d 8 — d) 2 ] . . .(27) 

1 Vj(s-I) 







126 


WALTER A. HENDRICKS 


in which: 

d = Pldl + HI + Pd* (28) 

pi + Pi + * • * + p$ 

By means of these relations, an estimate, Si, of the variance of a single measure¬ 
ment may be obtained from the observed distribution of the differences of the 
type, dj . This estimate represents the variance due to interaction and may be 
obtained from the equation: 

S 2 d = ~ —r lpi(di - d) 2 + p 2 (d 2 — d) 2 + - • • + p s (d 8 — d) 2 ].(29) 

£> — 1 

It is quite apparent that 5—1 degrees of freedom are available for the esti¬ 
mation of the variance due to interaction in this particular example. 

REFERENCES 

Fisher, R. A., 1932. Statistical Methods for Research Workers, 4th edition. Edinburgh 
and London: Oliver and Boyd 

Snedecok, George W., 1934. Calculation and Interpretation of Analysis of Variance and 
Covariance. Ames, Iowa: Collegiate Press. 

Yates, F., 1933. The principles of orthogonality and confounding in replicated experi¬ 
ments. Jour. Agr. Sci., 23: 108-145. 


Bureau of Animal Industry, 

U. S. Department of Agriculture, 
Washington, D. C. 





NOTE ON THE DISTRIBUTIONS OF THE STANDARD DEVIATIONS 
AND SECOND MOMENTS OF SAMPLES FROM A 
GRAM-CHARLIER POPULATION 

By G. A. Baker 


T. N. Thiele in his “Theory of Observations” makes the following statement 
with regard to the distributions of the higher half-invariants in samples of n: 
“Not even for have I discovered the general law of errors.” 1 The purpose 
of this paper is to shed some light on the distribution of M 2 and to give the distri¬ 
bution of second moments about a fixed point when the sampled population 
can be represented by a Gram-Cliarlier series. 

The distribution of the second moments about a fixed point of samples is 
given in complete generality. It is known that if the sampled population is 
normal there is a simple relation between the distribution of the standard 
deviations of samples of n and the distribution of the second moments of the 
samples about the mean of the population. It was thought that such a relation 
might exist in ease the sampled population could be represented by a Gram- 
Charlier series. Such is not the case. Again, it was thought that by obtaining 
the distribution of the standard deviations for samples of 2, 3, 4, - • • it might 
be possible to deduce empirically a general law of distribution. This proved an 
unfruitful line of investigation but required so much labor that the results 
should be reported to save others time and energy. 

First, suppose that a population may be represented as 


(i) 

where 


fix) = ao<po(x) + a#p 3 (?) + + • • • 


<Pi(x) = 


dx l 


Then applying Theorem II of the author’s paper on “Random Sampling from 
Non-Homogeneous Populations” 2 we deduce at once the following theorem. 

Theorem I. The distribution of the second moments about the origin of 
(1) of samples of n drawn at random from a population represented by (1) is 
precisely the same as the distribution of the second moments about the same 

point of samples of n drawn from a population represented by the first term of 

__ i 

(1), that is a normal population, and is proportional to x 2 e 2 * (loc. cit.) 


1 Thiele, T. N., “The Theory of Observations,’ ’ reprinted in the Annals of Mathematical 
Statistics, Vol. 2, No., 2, May, 1931, p. 208. 

*Metron, Vol. 8, No. 3, Feb. 28, 1930. 


127 



128 


G. A. BAKER 


This is not so surprising as it may seem at first if it is remembered that the 
odd subscript terms of a Gram-Charlier series slice off frequencies on one side 
of the mean of awoix) and add them onto the other side in the same manner. 
If we suppose that a population is given as 

(2) f{x) = (i(xpo(x ) + aw six) + awiix) + • • • 


in the same manner we get the following theorem. 

Theorem II. The distribution of the second moments measured from the 
origin of (2) of samples of n drawn at random from (2) will be a combination 
of distributions of the type of Theorem I with only even subscript terms con¬ 
tributing anything. The variations in the component distributions will consist 
of differences in the constant factors and the exponent of x y the estimate of the 

y\ — 2 

second moment. The lowest exponent will be ——. 

For instance, if 

(3) fix) = GTov^uCr) -)- a-wsix) + aw\{x) 


and n = 2, the estimates of the second moment will be distributed as pro¬ 
portional to 


c * r (&o 3) 2 — 12a 4 (a 0 -f- 3).r -f- (36aj -f- 6a 0 a 4 -f- 18a 4 ) 


36a 4 -f- 9a 

•j! 


a .r 4 ] 

4 4! J 


Thus, it can be said that we know the distribution of the second moments of 
samples about a fixed point if the sampled population is of the Gram-Charlier 
type in the sense that given the number of terms necessary for an adequate 
representation and the number in the samples we can write down the desired 
distribution. However, this is not a simple matter. Further, if some relation 
existed between the distributions of the second moments about a fixed point 
and the standard deviations of the samples we would know the latter distribution 
also. Such a relation is not apparent for samples of 2 and 3. 

Let us investigate the correlation surfaces of the means and standard devi¬ 
ations of samples of 2 and 3 drawn at random from a population represented 
by the first few terms of a Gram-Charlier series after the method of Dr. A. T. 
Craig. 3 The distributions of the standard deviations can then be obtained 
immediately by integration. 

Suppose that 

(4) fix) = awoix) + a 5 <psix) + awnix) 


3 Annals of Mathematical Statistics , Vol. 3, No. 2, May, 1932, pp. 126-140. 



DISTRIBUTIONS OF STANDARD DEVIATIONS AND SECOND MOMENTS 129 


and that we are considering samples of 2. The probability of the concurrence 
of Xi and x 2 is 


(5) 

/CrO/fo) 

and 


(6) 

Xi = -S-\-X 


where s is the standard deviation and x is the mean of a sample of 2. By means 
of (6), (5) becomes 

(7) e _(«=fz^[ a 2 a 0 a 3 ( — 6s 2 x — 2x 3 + 6x) 

+ Oofl 4 (2s 4 + V2s 2 x 2 - 12.s ,<2 - 12x 2 + 0) 

+ al( — s r> + 3 s 4 x 2 + 6s 4 — 3,s ,2 r 4 — 9.s* 2 + 9x 2 — 6.r 4 + z r ') 

+ a 3 « 4 (2.s 6 — 6s 4 x 3 — 6s 4 x + 6s V — 12.s* 2 x s + 18.s*\r — 2x 7 


+ 18x 5 - 42x 3 + 18x) 

+ a 2 , (s 8 -4.s«x 2 - 12s® + 6s 4 .r 4 + 12s 4 .r 2 + 42s 4 - 4sV 


+ 12s 2 x 4 - 36s 2 .r 2 - 36s 2 + - 12x 6 + 42.r 4 - 36x 2 + 9)]. 

To find the distribution of s we must integrate from — oo to *> with respect 
to x. Thus, (8) is obtained. 


( 8 ) 


VT e a \; + « 0 a„(2/ - 6/) + «a(-s 6 + 


+ 2a,a 4 s* + 


o; (.s' 


14/ + « 


105 4 

2 " * 


105 2 105 

2 ' S + 2 


5 )]- 


If we retain only two terms of (3), i.e. use 

(9) f(x) = a { *p „(.r) + 

and consider samples of 3 we obtain as the correlation surface of .r and ,s 

18 = «r*«*«4«2 - a -*p (-40/ + 2lx/ - 24x) 

V3 L 4 

+ (ln( '/ (-84/ + 525x 2 / - 2752x 4 / 

64 

(10) + 576/ - 1008x 2 / - 288/ - 5586x« + 270x 4 - 1728x 2 ) 

+ (28/ - 6189xV - 28x 4 / - 629x® + 288/ + 1344x 4 

64 

+ 4608x 2 / - 288/ + 729x 2 )l. 



130 


G. A. BAKER 


The distribution of s can be obtained as before. The processes involved in 
obtaining (7) and (10) are so complicated that the general rule for writing the 
distribution of s is not apparent. Also, the relation of the distributions of s to 
the corresponding distributions of the second moments about a fixed point is 
not apparent. 

In summary, the general distributions of the second moments about a fixed 
point of samples from a population represented by a definite number of terms 
of a Gram-Charlier series and the distributions of the standard deviations of 
samples of 2 and 3 from the same type of population arc given and compared. 
No apparent relation exists between them. 



ON THE FINITE DIFFERENCES OF A POLYNOMIAL 

By I. H. Barkey 

In this paper an apparently new and convenient method of finding the suc¬ 
cessive finite differences of a polynomial is considered. If operationally 

<t>(u + r X T 2 ) = E r ' r ‘ <t>(u ) = (1 + Ari) r * 
then for any polynomial/(x) of degree “n” 

/(*) = Pox" -)- pix n ~ l + • • ■ + Pn 

= p 0 (x + a)" + <]u(x + a)’ 1 ” 1 + • • • + tfi» 

E a f(x) = p 0 (x + a)" + pi(x + a)" -1 + •••+?„ 

A<,/(x) = (pi - </n)(x + a) n_1 + (p 2 - f/ 12 )(x + a)’ 1 - 4 + • • • + (p„ — tfm). 
Similarly, if /i(x) = Aa/(x), then 

/i(x) = (pi - ?n)(x + 2a)"” 1 + </ 22 (x + 2a)"- 2 + • • • + tf 2 » 

E a f x (x ) = (pi - q u )(x + 2a)"- 1 + (p 2 - <m){x + 2a)"- 2 + • • • + (p„ - ?,„) 
A„/i(x) = (p 2 - (/12 - </ 2 2 )(x + 2a)"” 2 + • • • + (p„ - qin - ( 7 2 „) 

and so on for the higher orders, since A„/„_i(x) = A*/(x). In the practical 
application of this method, “a” may be conveniently taken as unity, and an 
abridged form of synthetic division employed. Thus, if 

/(x) = 5x 4 + 3x 3 + 7x 2 — 2x + 3, then 

5+ 3+ 7- 2!+ 3=/ 

- 2 + 9 - 11 + 14 
_ 7 + 16-27 

-12 + 28 

- 17 

20 - 21 + 25 - 11 = f x 
-41 + 66 - 77 

- 61 + 127 

- 81 

60 - 102 + 66 = U 
- 162 + 228 
- 222 

120 - 162 = U 
- 282 

120 = U 


131 





132 


I. H. BARKEY 


As is evident from the darkened numerals, all figures to the right of the dotted 
line are redundant and may be omitted. From the above, 

A fix) = 20(x + l ) 3 - 21 (a; + l ) 2 + 25(x + 1) - 11 

A 2 /(x) = 60(x + 2) 2 - 102(x + 2) + 66 

A 3 f(x) = 120(x + 3) - 162 

A 4 /(x) = 120. 



SOME PRACTICAL INTERPOLATION FORMULAS 

By John L. Roberts 

Sometimes we wish to find by means of interpolation an approximation to a 
particular value of w x in the interval between the known values, w 0 and w\. 
But it also might be desirable in the interval from w 0 to w x to interpolate several 
approximations to w x at equidistant values of x. It is very important to know 
that a formula which might be very satisfactory to intcr|)olatc a particular value 
in an interval might seriously fail to be the most satisfactory formula when it 
is desired to interpolate several values in the same interval. The range of this 
paper is so limited that we only wish to find by means of interpolation several 
approximations to the true value of w x in the interval from w 0 to Wi at equidistant 
values of x. 

One way to perform an interpolation of this sort is to use osculatory inter¬ 
polation. 1 The real function of osculatory interpolation is to secure smooth¬ 
ness at the known points, which are sometimes called pivotal points. By 
roughness is meant that one or more of the successive derivatives are discon¬ 
tinuous at the pivotal points. Experience proves that the osculatory formulas 
usually secure smoothness either at the expense of labor or by a loss of accuracies 
over the entire range from iv 0 to wi. Frequently the function of interpolation 
formulas is to save labor. In many cases it appears reasonable to save labor 
by a loss of both smoothness and accuracy. Formulas are herein selected, 
without direct regard for smoothness, so as to secure the best possible compro¬ 
mise between a maximum of accuracy and a minimum of labor. It appears 
that this results in many cases in a loss of smoothness that is no more objection¬ 
able than the loss in accuracy. 

The actuarial profession, while trying to perfect their methods of constructing 
mortality tables, have made contributions of a high order of scholarship to the 
theory of osculatory interpolation. But since the statistician, the astronomer, 
the physicist, and other scientists also have occasions to make interpolations, 
it seems to be very important to discuss the problem of finding the most prac¬ 
tical methods of interpolation, not only from the special viewpoint of the 
actuary, but also from the general viewpoint of mathematics. 

A w x is called the first difference of w x , and may be defined by A w x = w Xi .i — w x . 

1 Since this paper presupposes certain knowledge on the part of the reader, it may be 
worth while to indicate some sources of this knowledge. The elementary parts of this 
knowledge can be found in any good book on finite differences “Population Statistics 
and Their Compilation” by Hugh H. Wolfenden, published by the Actuarial Society of 
America, contains an excellent summary of osculatory interpolation. This summary 
indicates some valuable sources of information. 

133 



134 


JOHN L. ROBERTS 


Second, third, and higher differences are merely successive differences of the 
first. When use is made of central difference interpolation formulas, it is 
convenient to adopt Woolhouse’s notation, which is defined by means of the 
following equations: Aw _ 2 = a_ 2 , Aw_i = a. i, Awo = a 1 , Aici = a 2 , A 2 ic _ 2 = 6 - 1 , 
A 2 w_i = b 0 y A 2 Wo = 61 , A 3 u ’_ 2 = c_i, A 3 ic_i = Ci, A 4 w _ 2 = do, A 5 w _ 2 = Ci, A 6 ic_ s = / 0 , 
etc. 

An important family of curves can be represented by 

W* = Wo + xai + ~ x(r - \)B + i ar(ar - l)^r - 0 C. (1) 

Assume Uq = Wo and Aw<> = Aw 0 . Then a study of ( 1 ) shows that a Xy which 
has already been defined, must be a factor in the second term in order that ( 1 ) 
may be satisfied when x = 1. ( 1 ) is a third degree equation. However, if 

C = 0, (1) becomes a second degree equation; if both B = 0 and C = 0, (1) 
becomes a first degree equation. In other words, by giving B and C proper 
values, (1) can be made to become many different interpolation formulas. 

For many purposes interpolation by a first degree formula is not sufficiently 
accurate. . We, therefore, might wish to interpolate by either a second or a 
third degree formula. Since it is possible to draw an unlimited number of 
second degree curves or third degree curves between the points P 0 and Pi, the 
problem of selecting the best second degree interpolation curve and the best 
third degree curve is of great practical importance. 

I 

Suppose that tc_ 2 , tc_i, ic’o, w Xl w 2) and tc 3 can be found in a table of values 
of the function w x , and that we wish to find by means of interpolation several 
approximate values of w x in the interval from w 0 to w x . These six given values 
of w x can be used to determine six pivotal points, which determine a fifth degree 
curve. Suppose this curve represents the function v x . Then w x and v x w r ould 
have exactly the same values at the six pivotal points, but would have values 
which are only approximately the same at other points. Using the first six 
terms of the Gauss central difference interpolation formula, we have 

Vx = Vo + X ( X - !)&» + g 1 } 0 + 1M* - l)Cl 

+ ( x + l) x ( x — 1)(* - 2)rfo 

+ jTj (x 4- 2)Or + \)x(x - l)(x - 2)ei. 

It is proper to use in this formula the differences a 1; b 0j etc., which have already 
been defined as differences of w x because these differences are exactly equal to 
the corresponding differences of v z . Suppose P 0 , Pj, Pj, and Pi are four points 



SOME PRACTICAL INTERPOLATION FORMULAS 


135 


which are determined by v x . Then B and C can be determined so that (1) will 
represent the curve which can go through these four points. 

Then 


u \ 




and 


Also 


and 


, 1 1 A , 4 5 , 7 \ 

V * ~ Uo + 3 “i “ 9 Y° + 9 Cl “ 27 do ~ 81 V ' 


Mj = Wo + g ai 


-l( B + Ts c ) 


Vj = Uo + 


2 1 (\ 
3 — 9 \ 


60 + l Cl ~ ^7 do 


-H*)' 


Since = pj and we have two equations, which can be solved for B 

and C. 


B = b — d and C = Ci — ~ ei (2) 

where b and d are defined by 

b = ^(b 0 + bi) and d = | (d 0 + di). 

A study of (1) shows that iq does not depend upon (7 because the term con* 
taining C becomes zero when x — \ , and also shows that u x over the entire range 
from Uo to U\ is more sensitive to errors in B than errors in C. The B in (2) 
usually contains some error because the six terms of the Gauss formula which 
were used in determining B usually produce results which are only approximate. 
Consequently a comparatively large error in C would not produce an important 
error. 

Assume 


B = b - ArfandC = c, - A*. (3) 

B is the same in both (2) and (3), but C is not the same. The accuracy of 
(2) and the accuracy of (3) do not differ by an important amount. On the 
other hand, if any attempt to apply (2) is compared with the working illustra¬ 
tions of (3) in this article, it will be found that (2) to an important extent is 
more laborious than (3). Therefore (3) is a better compromise between a 
maximum of accuracy and a minimum of labor than (2). For this reason (2) 



136 


JOHN L. ROBERTS 


ought not to be regarded as a practical formula. On the other hand (2) because 
of its great accuracy serves as an ideal with which other formulas can be com¬ 
pared. In other words (2) is of theoretical importance. 

In like manner another interpolation formula can be found if we use the first 
four terms of the Gauss formula to determine P*. Then 

, 1 1 T) 

u\ = vo + ^ 01 § B 

and 

V* = Wo + ^ a, - Ubo + * Cl) . 


Since u * = v j, we can solve for B , and C is left arbitrary. If C = 0, we again 
get an excellent compromise between a maximum of accuracy and a minimum 
of labor. The following second degree formula results. 

B = b and C = 0. (4) 


In order that the value of (3) and (4) may be appreciated, they are herein 
compared with some other formulas which have been of historical importance. 

If the point P* can first be accurately determined, a second degree curve 
through the points P 0 , P$, and P x would probably give more accurate results 
than such a curve through the points Po, Pi, and P 2 because the first three 
points are in a smaller neighborhood; the second curve can be represented by 
the first three terms of the Gregory-Newton interpolation formula. The points 
P_i, P 0 , Pi, and P 2 determine a third degree curve, which can be represented 
by the first four terms of the Gauss central difference formula. It is probable 
that these terms would determine P* much more accurately than the first three 
terms of the Gregory-Newton formula because the latter is not a central differ¬ 
ence formula with respect to Pj and because four terms usually give more 
accurate results than only three terms. Consequently there is a strong prob¬ 
ability that (4) is more accurate than the first three terms of the Gregory- 
Newton formula. In like manner (4) is more accurate than the first three terms 
of the Gauss formula. It is interesting to observe that (4) is the first three 
terms of the Newton-Bessel formula. 


If B = b and C = 3ci, 

then (1) is equivalent to Karup\s oscillatory interpolation formula in terms of 
differences taken centrally, B is the same in both (4) and Karup’s formula. 
No interpolation formula can be very accurate unless C is about equal to c\. 
Since, then, the error in C in Karup’s formula is about twice as great as the error 
in C in (4), his formula is distinctly less accurate than (4). Since (4) is a second 
degree curve and Karup’s formula is a third degree curve, his formula is very 
much more laborious. (4) is extremely accurate for a formula having its labor 
saving properties; for many purposes its roughness and inaccuracy appear to 



SOME PRACTICAL INTERPOLATION FORMULAS 


137 


be in about the right proportion. On the other hand Karup’s formula is ex¬ 
tremely inaccurate for a formula so laborious; its only good point is its smooth¬ 
ness. 

Changing somewhat the meanings of u and w , (3) may be written 


^x-fn — -j- XAll n 


+ 2 x{x - 1)^ (A 2 u>„ + A 2 u> n -i) - ~ (A 4 w>„_i + A 4 ..)] 

+ g x(x 1)^C - - ~ A 6 w„_ 2 ^ . 


then 


du _ 
dx = 


| felt, 

which is the amount of discontinuity in ^ at A. (3) has greater smoothness 

than (4); in other words (3) is more like an oscillatory formula. On the other 
hand 

B — b — \d and C = Ci — ^ e x , (5) 

o o 

which is equivalent to an important oscillatory interpolation formula by Mr. 
Robert Henderson, compares much better with (3) from the viewpoint of labor 
saving and accuracy than Karup’s formula does with (4). 

II 

An excellent formula can be easily spoiled if the method of applying it is not 
practical. Mr. Henderson, in the Transactions of the Actuarial Society of 
America, Vol. IX, applies (5) in such a way that the numerical work is very 
convenient. Some writers seem to have been very careless about this matter. 
A method intended to interpolate several values betw r een tco and w\ should 
provide that the end value W\ shall be exactly reproduced if no error is made in 
the computation. In other words a good method should provide a check upon 
the work. At the same time, in order to avoid unnecessary labor, the work 
should not retain unnecessary decimal places or figures. In other words ficti¬ 
tious accuracy should be avoided. The following working illustrations are in¬ 
tended to show good methods of application of formulas and to show how much 
labor is necessary in order to apply them; also the size of the errors can be used 
to illustrate the theory. 



138 


JOHN L. ROBERTS 


When (4) is applied at either end of the table, where terms are not available 
for the calculation of the differences required, it should be assumed that the 
fourth differences that cannot be computed vanish and the required differences 
should be filled in consistently with that assumption. Aw x represents the first 
differences. But it is convenient to have S represent the first differences in 
such a manner that they are arranged centrally in the working illustration. S 2 
in like manner represents the second differences. The 2 in S 2 means S 2 is a 
second difference, and does not have the familiar meaning used in algebra. In 
the case of (4), A u x = a\ + xB , A 2 u x = £, and the higher differences all equal 
zero. Since we wish in the working illustration of (4) to interpolate four values 
between w 0 and w h 8 and 6 2 are defined by 8u x = u x+ . 2 — u x and 8 2 u x = 8u x + 2 
— bu x . It is proved in any good book on finite differences that there are possi¬ 
bilities that A and 5, which are symbols of operation, can be separated from the 
functions upon which they operate, and they can be treated as if they were 
algebraic numbers. Conseq\^tly 1 + 5 = (1 + A)*. In other words by means 
of the binomial law 8u x = (.* — .08A 2 )m x , where all the terms within the paren¬ 
thesis are to be considered ^s oper^mg = .04A 2 u x - s , s x , and 

s 2 are defined by s = s x =* dtf x , aflPW^^pB^BpjB^ercfore the middle s = 
8u .4 = . 2 ai, and $ 2 = .04J5 = . 02 ( 6 0 + 6 i). We are now in position to apply (4) 
to the case when w x = (1.04) w . It might prevent confusion if it is stated that 
x and n are related to each other in such a way that we always interpolate 
between tc 0 and W\. 


n 

( 1 . 04 )’* 

S 

s s* 


80 

23.050 

.9218 

.845 


81 

23.9718 

.9603 



82 

24.9321 

.9988 

4.994 

.0385 

83 

25.9309 

1.0373 



84 

26.9682 

1.0758 



85 

28.044 

1.1190 

1.081 


86 

29.1630 

1.1670 



87 

30.3300 

1.2150 

6.075 

.0480 

88 

31.5450 

1.2630 



89 

32.8080 

1.3110 



90 

34.119 

1.3636 

1.317 


91 

35.4826 

1.4210 



92 

36.9036 

1.4784 

7.392 

.0574 

93 

38.3820 

1.5358 



94 

39.9178 

1.5932 



95 

41.511 


1.553 




SOME PRACTICAL INTERPOLATION FORMULAS 


139 


Some of the explanation of the application of (4) applies to (3) and does 
not need to be repeated. The method herein used of applying (3) is either the 
same as or a development of the Henderson method of applying (5). If it is 
desired to apply (3) at either end of the table, where terms are not available 
for the calculation of the differences required, it can be assumed that the sixth 
differences that can not be computed vanish and the required differences can 
be filled in consistently with that assumption. A study of the theory under¬ 
lying this assumption shows that it does not result in a true central difference 
formula and that it consequently results usually in some loss of accuracy. In 
the case of (3) before the finding of the differences of (1), it is convenient to 
write it as follows: 


u x = Wo + xa , + * x(x — 1)^B + * + * x(x - \){x - 2)C. 

Then 

A«x = ai + + \ °) + \ x( - x ~ 1 ) C » 

A 2 u x = ^B + ^ + xC, and A ?u x = C . 

Suppose we wish to interpolate four values between w 0 and w\. d and 5 2 
have already been defined. b 3 u x = 5 2 w x+ 2 — b 2 u x . Then 1 + 8 = (1 + A)* f 
or 5u x = (.2A — .08A 2 + .048A 3 )w x . Also d 2 u x = (.04A 2 — .032A 3 )u x and &hi x = 
.008A 3 - $ 2 , s Xi and s 3 are defined by s 2 = si = b 2 u x ~ 2 , and s 3 = s\ = d 3 u x . The 

first 

s 2 — b 2 u-. 2 = .04^# — 

The last 

s 2 = 5 2 u.8 = .04 (b + * Cj = .04^ ~ ~d)j. 

5 

.1852 might be a useful approximation to The remaining s 2 f t s should be 

filled in so that they are in arithmetical progression with irregularities at the 
ends. If the irregularities can be distributed equally at both ends, the irregu¬ 
larities cause an error in C, but none in B. Errors in B are more important 
than those in C . The middle s = 8u 4 = .2ai — s 3 . In the following working 
illustration, w x = sin n. 


\c)-. w(fc-A*). 


[ In the sixth lin^from bottom 



140 


JOHN L. ROBERTS 


V 

sin n 

s 

S 2 

s* 

S 4 

-60 

-.86603 

.36603 




-30 

-.50000 

.50000 

.13397 

-.13397 


0 

.00000 

.50000 

.00000 

-.13397 

.00000 

30 

.50000 

.36603 

-.13397 

- 09809 

.03588 

60 

.86603 

.13397 

-.23206 



90 

1.00000 






n 

sin n 

S 

s 2 


0 

.00000 

.104498 

.000000 


6 

.104498 

.103374 

- 001124 


12 

.207872 

.101125 

2249 

-.001125 

18 

308997 

.097751 

3374 


24 

. 

.406748 

93252 

4499 


30 

.50000 

, ! 

- .005624 



Suppose we wish to interpolate nine values between ir u and u>j by the use of 
(3). Then 8 u x = u z+ .\ — u z , 3 2 u x = 81 i — 8u z , arc! 8 3 u z = 8 2 u x+ 1 — S 2 u z . 
Consequently 1 + 3 = (1 + A)’' 0 , or 3 u z = (,1A — .045A 2 + ,0285A 3 )m j; . Then 
6 2 u x = (.01A 2 - .009A 3 K and 8 3 u x = .001 A' 1 , s 2 = s 3 = 8 2 u x ..i and s 3 = s\ = 
3 3 u x . The first 

s 2 = 8 2 M -, = .01 (b -it’) = .0l(fe„ - ~ d)j. 

The last 

* s 2 = 8 2 U , = .01 (b + \ C^j = .0l(bi - A *) . 

8ua = (-lfli — 4s 3 ) — i 5 2 m. 4 and 8u.t — (.Icq — 4s 3 ) + is 2 « 4 . 

2i It 



SOME PRACTICAL INTERPOLATION FORMULAS 


141 



sin n 

8 

s 2 

s 3 

0 

.00000 

52318 

.000000 


3 

.052318 

52179 

- .000139 


6 

.104497 

51899 

280 


9 

.156396 

51478 

421 


12 

.207874 

.050916 

562 

-.000141 

15 

.258790 

.050212 

703 


18 

.309002 

49368 

844 


21 

.358370 

48383 

985 


24 

406753 

47257 

1126 


27 

.454010 

45990 

1267 


30 

.50000 


-.001406 


Suppose we 

wish to interpolate five \ allies 

between Wq and 

u\. The first 


A ' 2 = 4 ( /;o “ A di ) antl the last * 2 = 30 ( hl - Yl d ) ■ 


and 


j - i <*» - s 

u '*'• * .* ' ( * * 7* ' 

8u \ = g ( ai “ 85 3 ?/ x ) + ^ 5 2 u. . 


In the following working illustration the given values of sin n are written cor¬ 
rect to five decimal places; in other words after each decimal point there are 
five symbols or digits representing numbers; also each of these symbols is written 
in the scale of ten. It can be observed that some values of u Xf s, s 2 , and s 3 in 
the working illustration have six symbols to the right of the decimal point, and 
that some values have seven symbols to the right of the decimal point. In all 
cases the sixth symbol to the right of the decimal point is written in the scale 
of ten, and the seventh symbol is written in the scale of six. This procedure 
provides a (‘heck by exactly reproducing wi. Also this procedure does not cause 
much fictitious accuracy, and can be quickly used after a little practice. 


n 

sin n 

S 

0 

.00000 

87130 

5 

.0871305 

86479 

10 

.1736104 

.0851775 

15 

.2587883 

.0832245 

20 

.3420132 

80620 

25 

.4226341 

77365 

* 



30 

.50000 



.000000 
- .000651 
1302 
1953 
2604 
3255 


- .000651 


.003906 





142 


JOHN L. ROBERTS 


In general if we wish to interpolate i — 1 values between w 0 and Wi when i 
is neither five nor ten, w\ c an be exactly reproduced if some of the symbols are 
written in the scale of i. If i = 12, it is evident that we need two extra symbols, 
say t and e, to stand for ten and eleven respectively. If we wish to interpolate 
2 — 1 values between w 0 and n\ by the use of (4), in the computation each of 
u Xi s and s 2 except the given values should contain one more symbol than each 
given value contains, and the extra symbol should be written in the scale of i. 



ON EVALUATING A COEFFICIENT OF PARTIAL CORRELATION 

By Grace Strecker 


It is to be shown here that when the multiple correlation coefficient R n , 12 . •. (n-i> 
is found by the method of Horst 1 the partial correlation coefficient R n (n-iy, 12 -. • (»- 2 ) 
can be found in terms of the p’s. If we are interested only in evaluating a 
partial correlation between two variables, we may also employ the method which 
will be given here. 

Without loss of generality the dependent variables may be chosen to be the 
nth and (n — 1 )st. The coefficient of partial correlation as given by Rietz 2 
may be expressed in the following form: 


( 1 ) 


R 


«(»—!); 12 • • • (n-2) 



R (n—-1) (w—1) R 

R(n— 1)(»—l)nn_ Run 

R (n—1) ( n—1) 

R (n—l)(n—l)nn 


R ^-!)(„_!) may be treated as a new determinant R'. Regarding its elements 
as the coefficients of a set of normal equations (n — 1 in all) whose constant 
terms are zero, wc may follow through the Doolittle elimination process. For 
the case where n = 4 we have the table given below. 

In comparing this outline with the one illustrating the Doolittle elimination 
process for R when n = 4 we see that 


/ 1122 

722 = 722 “ Wa 7i’ 


733 


33 — a 33 — 13 “ a 44 Ea. 


Therefore, we have 


An r A mi 
W * Wa 


-•L- 
11 \ 2 

= H •v** («« - S ^* 4 )' 


1 Horst Paul, 4 Method for Solving for a Coefficient of Multiple Correlation , An¬ 

nals of Mathematical Statistics, Vol. Ill, No. 1, Feb. 1932, pp. 40-44. 

2 Rietz, H. L., Mathematical Statistics , p. 101. 

143 



144 


GRACE STRECKER 


Reciprocal 

1 

2 

3 

a 

P 

7 

a 

/? a 

-An 

A 12 

A J4 

/ 


/ 


An 


IP 

IP 

a i 


71 



-i 

A12 

A 14 






il.l 

An 




d i 



A 22 

A 24 

/ 

a 2 






IP 

rp 






A 2 
^ 1 2 

A 12 A 14 


P‘2‘2 





7« ,2 .4„ 

It 2 An 






A A H22 

A A 1124 



/ 

72 


A A1122 


IP An 

IP An 







-1 

A H24 







A H22 




u 2 




A 44 

R 2 

r 

<*3 







1 2 
-l 14 

R 2 An 


Pi 






_ aa? 124 

^ 2 A ii A H22 


Pi 









/ 

73 





-i 






In the general case: 


Til = Tii , 
/ 

722 ~ 722 ) 


7(n—2)(n—2) T(n— 2)(n-2)> 

/ n ~ 1 
7(n— 1)( n— 1 ) ~ • 





ON EVALUATING A COEFFICIENT OF PARTIAL CORRELATION 


145 


Hence 


n—2 / n —1 \ 

n — 1)(n— 1) = R — U yiiy&nn — ^ • 

n n—2 

Since R = ri 7u> then Z£( n -l)(n—l)nn = n 7 tt , from which we see that 

n—2 / n-~ 1 \ 

?( ( IT Ttt (“nn ~ S 

Kn-l)(n-l) 1 \ 2 / 


R(n—1) (n-l)nn 

But since <* nn = 1, then 


It has been shown that 


II T„ 

i 


n— 1 

= ttn# — ^ 1 ^! n • 




/f 


(n—1) (n-l)nn 


= 1-2 A» ■ 


# = 1 - 2>»- 


R(n—l) (n—1) 




Substituting the above values for —L and - in equation (1), we have 

(n—1) (» - l)nn Rnn 


^n(n-l), 12- • • (n-2) = 


l/ 


1 - Z ft. - (l - Z ft.) 


1 - Z A. 


or 


I^n(n-l); 12 • • • (n—2) == 


n^JL 

1 - ^ 


Henct* it is seen that when the /3\s given by Horst (page 42) are calculated, 
it is an easy matter to solve for the partial correlation /£ n (n-n, 12 ... (n- 2 >. 


St. Louis University, 



A THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 

AND CHECK LISTS 1 

By Lee Byrne 

Visiting Professor of Secondary Education, New York University 

Part I. Research Products Which May Be Classified as Derivative 
Specifications and Check Lists 

Meaning of Specification 

In specification something is assigned a specific character. The something 
to be thus assigned a specific character may be called the specificandum. The 
specific character assigned to the specificandum, or (as a second meaning) the 
act of so doing, may be called the specification. 

A proposition is the smallest unit in which it is possible to embody a complete 
thought and is ordinarily represented by a single sentence. In specification 
the characterization may be confined to a single proposition or it may be ex¬ 
tended to include an indefinitely large number of propositions. So a speci¬ 
fication may be embodied in a sentence, a paragraph, a chapter, or a whole book. 
No matter how far it is extended it will never give complete determination, as 
our knowledge cannot be made exhaustive or our control be given an absolute 
precision. 

In view of the meaning assigned to specification it is evident that very many 
books and monographs could in this sense be classified as specifications. 

Meaning of Derivative Specification 

There is a type of specification (book or monograph) which is developed by 
deriving it from a group or class of specifications which already exist. This 
class may be a total class of all such specifications, or a group of those accepted 
as authoritative, or a group of those taken to be representative. A specification 
derived in this manner may be called a derivative specification. As an example 
we could take almost any first-class work by a present-day historian; by his¬ 
torians it would be called “secondary” because it is based on study of pre¬ 
existent documents called “primary sources.” 

Meaning of Check List 

The act of deriving a product from a pre-existent set of documents may, as 
we have seen, take the form of a derivative specification, embracing an as- 

1 This paper is an amplification of a report made in the statistical section of the Ameri¬ 
can Educational Research Association at its meeting in February, 1931. 

146 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


147 


semblage of determinates or determinations. On the other hand the product 
derived may be intended merely to indicate the ground covered or to be covered 
by determination, without actually selecting the particular determinations. 
Such a product will be called a check list. The term is not a very happy one, 
but it is in very common use. If we think of a specification as an assemblage 
of determinations then a check list could be thought of as a corresponding set 
of determinables . 2 Since any determinable is capable of an indefinite number 
of determinations it is evident that a long check list could give rise to an ex¬ 
tremely large number of different specifications, of which, of course, some frac¬ 
tion might prove undesirable, inadmissible, or false. 

Modes of Specification: How We Specify 

If we examine any specification to see how the specifying is done w'e shall 
find that it ultimately takes the form of specification under aspects. The fol¬ 
lowing diagram indicates the principal (perhaps all the) possibilities in the way 
of specification. 

Naming the original or main specificandum 
Naming an aspect 

Characterization of the specificandum under the aspect named 


Naming a relation (includes process, operation etc.) 
Naming an aspect of the relation 

Characterization of the relation under aspect named 


Naming a relatum or thing related (a new specificandum) 
Naming an aspect of the relatum 

Characterization of relatum under aspect named 


Naming a part 

Naming an aspect of the part 

Characterization of the part under aspect named 


(The naming of aspects may be merely implicit but it is always present in 
principle.) 

* On the notion of the “determinable,” which is due to W. E. Johnson, see his Logic, 
Cambridge University Press (1921), Part I, p. xxxv and Chapter XI. 



148 


LEE BYRNE 


Thus it appears that if specification is pressed far enough it always ultimately 
becomes specification under aspects. Aspect and determinable may be re¬ 
garded as synonyms. 

Current Examples of Derivative Specifications and Check Lists 

At the present time it will be found that we have very many products of 
research which take forms capable of being classified as some kind of derivative 
specification or (derivative) check list in the senses in which these expressions 
have been explained. 

I have distinguished more than twenty different logical types of derivative 
specification or check list which are exemplified in the current literature of 
educational research and related subjects. However space will not permit 
exhibition of examples of these different types. 

Part II. Validation of Derivative Specifications and Check Lists 

Many research products may be classified as derivative specifications or check 
lists, derivative in the sense that they have been derived from a group of docu¬ 
ments (books, articles, journals, newspapers, courses of study, etc.) through 
analysis of their content. Such source documents themselves we shall call 
specifications or groups of specifications. 

The only validation problem raised here is the question whether the resulting 
check list or derivative specification truly represents the class of source specifi¬ 
cations used. The further question whether the class of source specifications 
itself constitutes a satisfactory source is not discussed. 

From this point of view, if a check list or derivative specification is based in 
some suitable manner on nil the documents of the class represented, no real 
validation problem arises; the validity has to be regarded as perfect. 

It may often happen that the investigator does not wish to analyse all of 
the specifications of the class in question but prefers to save time and labor by 
confining his analysis to a select group drawn from the total class as a sample. 
In this case the problem arises as to how far results based on such sample should 
be judged to be truly representative of the entire class of specifications (most 
of which have not been analysed). A problem of this nature may be called the 
problem of validity for this kind of work. 

Such a validation problem appears to take the same form whether the product 
to be validated is a derivative specification or (derivative) check list. Accord¬ 
ingly we shall for the sake of brevity carry on the discussion by referring to the 
problem as that of validating (derivative) check lists. The same principles 
would apply if the product happened to be a derivative specification. 

In order to consider the validity of a check list based on a sample group of 
specifications (called here a Sample Check List) we may hypothesize a check 
list based in the same manner on the entire class of specifications from which 
the sample was drawn. Such a hypothetical check list (which is not made) 
will be called the Ideal Check List. Then the problem of validity may be con- 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


149 


ceived as the question as to how far the content of the Sample Check List agrees 
with the unknown content of the Ideal Check List. 

An overlapping of the two appears ordinarily to be certain but a failure of 
complete coincidence is very highly probable. The question is what degree of 
coincidence is to be expected. 

This general validity problem naturally divides into two separate questions. 
The first question asks what proportion of the content of the Sample Check 
List may be expected to be present also in the Ideal Check List; this may be 
called the (sub-) problem of reliability. The second question asks what propor¬ 
tion of the content of the Ideal Check List may be expected to be present in the 
Sample Check List; this may be called the (sub-) problem of completeness. 
The answers to these two problems, if expressed in numerical percentages, could 
be called the Index of Reliability and Index of Completeness respectively. 

We shall first consider these two problems in their simplest form and after¬ 
ward in a more complex form in which they exhibited themselves in a recent 
study by the writer . 3 The simple ease presents no great difficulty and it is 
possible that a different method of disposing of it might be preferred. The more 
complex case, however, appears to be rather difficult of solution and the writer 
has not been able to find in the literature any developed technique for handling 
it. The simple case is presented here primarily because it affords, by further 
extension, a successful approach to the difficult problem of the more com¬ 
plex case. 


Simple Case 

Terms and Symbols 

The “class of specifications” will be understood to consist of all specifications 
which belong to the whole class of specifications regarded as a source, a class 
which we claim to represent in our final product. In this problem the “class” 
will not be regarded as indefinitely large but as consisting of a definite number 
of specifications, a number to be ascertained by actual count or by careful 
estimate. 

“Sample specifications” are the limited group selected from the class for 
purposes of actual analysis, and which play the role of representing the whole 
class. The remaining specifications of the class are not analyzed. 

“Sample Check List Material” is a name for the assemblage of all the different 
items found in one or more sample specifications. 

“Ideal Check List Material” is a name for a hypothetical assemblage of all 
the different items found in one or more specifications in the class. Only those 
appearing in some sample specifications can be actually known, the rest are 
hypothetical. 

3 Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 



150 


LEE BYKNE 


Write 

M (constant) = total number of specifications in class 
N (variable) = number of these specifications in which a particular item 
under consideration appears (this number is hypothetical and some 
of the particular items themselves are hypothetical) 
m (constant) = number of sample specifications 

n (variable) = number of sample specifications in which a particular 
(the same) item appears 

Values of n may be expected to vary for different items, from ra to 0 by inter¬ 
vals of 1, the zero value appertaining to any item wholly absent from the Sample 
Check List Material (hypothetically present in Ideal Check List Material). 

Values of N might be expected to vary, for different items, from M to 1 by 
intervals of 1. But in this problem the convention will be adopted that the 

M 

range is from M downward by intervals of —. Thus if the number M should 

rn 

be five times as large as the number m then the range for N would be treated 
as proceeding from M downward by intervals of 5: M, M — 5, M — 10, • • • 5. 

A “tabulation” will mean a statistical table showing how many different 
items appear in every possible number of specifications. A tabulation must be 
made by actual count for the items of the sample specifications, and will show 
the number of items having each possible value of n. A similar tabulation is 
hypothetical for the items in all the specifications of the class, that is for the 
number of items having each value of N permitted by the convention of the 
last paragraph. 

“Tabulation cell” (or simply “cell”) will mean, as needed, either the number 
of items or the group of items appearing in any designated number of specifi¬ 
cations. For Sample Check List Material it will be the number or group of 
items to which a particular value of n appertains; for Ideal ('heck List similarly 
the number of items or group of items to which a particular value of N appertains 
(hypothetically). 

“Sample Check List” will mean a list of items selected from the Sample 
Check List Material according to some adopted criterion. For illustrative 
purposes we shall consider this criterion to be, for example, the numerical 

.. . m 

ratio n ^ 

A 

“Ideal Check List” will mean a list of items selected from the Ideal Check 
List Material according to some adopted criterion. For illustrative purposes 

M 

we shall consider this criterion to lie the numerical ratio N ^ —. 

A 

Problem of Reliability 

The problem of reliability may be restated and renamed the General Reli¬ 
ability Problem. This may be broken up into a group of problems which will 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


151 


be called Elementary Reliability Problems. Each of the latter may be in turn 
broken up into a group of problems which will be called Ultimate Reliability 
Problems. Each Ultimate Reliability Problem may be solved directly. Com¬ 
bination of these solutions will yield solutions of the Elementary Reliability 
Problems. Combinations of the latter solutions will finally yield the solution 
of the General Reliability Problem. 

These problems will now be stated 

General Reliability Problem: What proportion of the items present in Sample 
Check List may be expected to be present also in Ideal Check List? 

Elementary Reliability Problem: What proportion of the items in a particular 
cell in Sample Check List may be expected to be present also in Ideal Check 
List? 

Ultimate Reliability Problem: What proportion of the items in a particular 
cell in Sample Check List may be expected to be present also in some designated 
cell in Ideal (-heck List? 

To solve an Ultimate Problem: 

From the Fundamental Theorem in the Theory of Inductive Probability 
(Whittaker, E. T. and Robinson, G. The Calculus of Observations. London: 
Blackie & Son. 1924. p. 305) the solution may bo expressed as 

/V V* 

2 Pp ’ 

Whittaker and Robinson's statement of the Fundamental Theorem in the 
Theory of Inductive Probability is as follows (form slightly changed without 
change in meaning): 

“Suppose that a certain observed phenomenon may be accounted for by any 
one of a certain number of hypotheses, of which one, and not more than one, 
must be true: suppose moreover that the probability of the R-th hypothesis, 
as based on information in our possession before the phenomenon is observed, 
is P R) while the probability of the observed phenomenon, on the assumption of 
the truth of the /£-th hypothesis, is p s . Then when the observation of the 
phenomenon is taken into consideration, the probability of the R-th hypothesis is 

ZPp 

where the symbol 2) denotes the summation over all the hypotheses." 4 

It is clear that an Ultimate Reliability Problem is a case falling under this 
Fundamental Theorem. The observed phenomenon is any item occurring in 
any specified cell of Sample Check List, say cell n = s. It may be accounted 
for by a certain number of hypotheses as to its source in the Ideal Check List 

4 For the fundamental position of this theorem in a theory of science and for its proof 
one may also consult Jeffreys, H. Scientific Inference. Cambridge: Cambridge University 
Press. 1931. Chapter II (section 2.34). 



152 


LEE BYRNE 


Material; the different cells in the Ideal Check List Material are these different 
hypotheses of origin, hypothetical because we do not know from which one it 
has come but only that it must have come from some one of them; the cell from 
which it actually comes is the true hypothesis, though we do not know which 
one that is. That the origin of the item is in cell N = R is the R- th hypothesis, 
and its probability is written P K , The probability of the occurrence of the 
phenomenon on the assumption of the truth of the R- th hypothesis is the prob¬ 
ability that an item in cell N = R will appear in Sample Check List in cell n = s 
and its probability is written p a . As we clearly have in our Ultimate Reliability 
Problem a case falling under the Fundamental Theorem quoted we may accept 
as the required solution of the Ultimate Reliability Problem the formula already 
given in the initial statement: 

Pup, 
zpp ’ 

This expresses the probability that any item found in Sample-Check-List cell 
n = s comes from (and appears in) Ideal-Check-List-Material cell N = R , or 
it gives the proportion of items found in Sample-Check-List cell n — s that 
may be expected to come from (or appear in) Ideal-Check-List-Matcrial cell 
N = R. 

Meaning of any value of P (say P H ) = the probability that any item, drawn 
at random from those cells of Ideal Check List Material which are possible 
sources of items in Sample-Check-List cell n = s, will happen to be drawn from 
cell N = R. 

Meaning of any value of p (say p s ) = the probability that any item in Ideal- 
Check-List cell N — R will also be present in Sample-Check-List cell n = s. 
(Important: this supposition is not equivalent to its converse.) 

Evaluation of P«: 

p __ number of items in cell N = R 

R number of items in all cells which are possible sources of items in cell n = s' 

For this ratio it is necessary to assume that the shape of the numerical curve 
formed by the group of Ideal-Check-List-Material cells is the same as that of 
the numerical curve formed by the group of Sample-Check-List-Material cells. 
On this assumption we may replace the numerator by the number of items in 
the Sample-Check-List-Material cell having an abscissa corresponding to that 
of the Ideal-Check-List-Material cell N = R, and replace the denominator by 
the sum of the numbers of items in all the cells with abscissae corresponding to 
those of Ideal-Check-List-Material cells which are possible sources of items in 
cell n = s. 

Evaluation of p 8 : 

By the aid of “the definition of probability which is used in practically all 
treatises on the subject” (Coolidge, J. L. An Introduction to Mathematical 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


153 


Probability. Oxford: Oxford University Press. 1925. p. 4) and the principle 
underlying the Theory of Combinations (Whitworth, W. A. Choice and Chance. 
New York: G. E. Stechert & Co. 1927. Proposition II) we are able to arrive 
at the evaluation: 


in which, for any p (say p a ), we employ for N the value N = /f, and for n the 
value n = s. As the denominator later cancels out it may be disregarded 
throughout, simplifying the formula to 

__ riM-N SIN 
P — ^ m—n L- n • 

(A symbol such as C N n is read “the number of combinations of N things taken 
n at a time”; also written in several other forms.) 

The definition referred to may be worded as follows (Coolidge\s own preferred 
definition is not quite the same): 

“An event can happen in a certain number of ways, which are all equally 
likely. A certain proportion of these are classed as favorable. The ratio of 
the number of favorable ways to the total number is called the probability that 
the event will turn out favorably.” 

The principle underlying the Theory of Combinations may be quoted from 
Whitworth as follows (also found in ordinary works on algebra): 

“If one operation can be performed in m ways, and then a second can be per¬ 
formed in n ways, and then a third in r ways, (and so on), the number of ways 
of performing all the operations will be m X n X r X etc.” 

If it is not at once clear that the formula for evaluation of p follows from the 
definition and principle just quoted, the following considerations should make 
it evident. 

We are working in terms of a particular item belonging to a particular Ideal- 
Cheek-List-Material cell, say cell N = II. “Favorable” occurrence requires 
that this item fall in a particular Sample-Check-List cell, say n = s, while 
falling in any other Sample-Check-List-Material cell (including cell n = 0 for 
absence) is “unfavorable.” Again the real meaning of the “favorable” occur¬ 
rence is that the item will be found in just n = s out of the m specifications of 
the sample, and absent in the remaining m — n specifications of the sample. 
Moreover presence in Ideal-Check-List-Material cell N = II means that the 
item occurs in just N = Tt of the M specifications that constitute the whole 
class and is absent in M — N of these specifications. The total number of all 
the ways (favorable and unfavorable) in which our event can happen means the 
same as the total number of all the ways in which a group of m specifications 
can be selected from a larger group of M , and this is, of course, written C% and 
given us in our denominator. The number of favorable ways in which our 
event can happen means the same as the number of ways in which N specifi¬ 
cations containing the item can form groups of n specifications while at the 



154 


LEE BYRNE 


same time M — N specifications not containing the item can form groups of 
m — n specifications; the first distribution can be done in C* ways and the 
second in C^Z N n ways, so by Whitworth’s principle the number of ways which 
these things can happen simultaneously is C„. Assembling numerator 

and denominator we have the formula initially stated for evaluation of p, viz.: 

SiM-N SlN 

V = — C M —• 

'•'ro 

This is the general formula; in applying to the particular example N = R t n = s 
the replacements for N and n, of course, give 

^sy.M-R sift 
m ^ 1H — * 1 

P a _ • 

in 

Having a means of evaluating P and p we may solve ail needed Ultimate 
Problems. The resulting solutions of the needed Ultimate Reliability Problems 
(not necessarily completed) enables us to arrive at the solution of any needed 
Elementary Reliability Problem in the form of a percentage which may be 
called an Index of Reliability for the Sample-Check-List cell in question. In 
computing this percentage we distinguish source-cells that belong to the Ideal 
('heck List from other source-cells that belong to the Ideal ('heck List Material 
but not to the Ideal Check List. 

By properly averaging cell-indices of Reliability (which are really Indices of 
Reliability for the individual items in the cells) we may obtain a solution of the 
General Problem of Reliability in the form of an Average Index of Reliability 
for the Sample Check List as a whole. 

In addition to the Average Index of Reliability for the Sample Check List 
we may easily secure also Average Indices of Reliability for any scries of briefer 
Sample Check Lists selected from the Sample Check List, by properly averaging 
the Indices of cells contained in any Sample Check List in question, keeping 
the original criterion for Ideal Check List. 

In practice it may not be necessary to compute all cell-Indices, as a portion 
of these may be entered in tables by any methods of interpolation regarded as 
acceptable. 

Problem of Completeness 

Again we have General, Elementary, and Ultimate Problems. These may 
be stated as follows: 

General Completeness Problem: What proportion of the items present in 
Ideal Check List may be expected to be present also in Sample Check List? 

Elementary Completeness Problem: What proportion of the items present in 
Ideal Check List may be expected to be present also in some designated cell in 
Sample Check List? 

Ultimate Completeness Problem: What proportion of the items in a particu¬ 
lar cell in Ideal Check List may be expected to be present also in some designated 
cell in Sample Check List? 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


155 


To solve an Ultimate Problem: 

From principles already used the proportion to be expected is the same as the 
value of p alone in an Ultimate Reliability Problem, viz.: 

C M-N syN 
tn— n n 
riM 
V'm 

By the use of this formula we may solve the Ultimate Problems for all values 
of N represented in Ideal Check List and all values of n represented in Sample 
Check List; some of these solutions will have a value of zero. 

For each value of n, if we properly average the solutions of the Ultimate 
Problems, we obtain a solution of the Elementary Problem for one Sample- 
Check-List cell in the form of a percentage which may be called the Index of 
Completeness for the particular Sample-Check-List cell. In securing this 
average it is necessary to multiply each Ultimate Problem solution by a relative 
number corresponding to the 1 assumed ratio of number of items in the particular 
Ideal-Check-List cell to the number of items in all the Ideal-Check-List cells. 
The source of the assumed relative 1 numbers is the same as that used in evaluat¬ 
ing P in the Reliability Problem. 

When we have an Index of Completeness for each Sample-Check-List cell 
we may obtain a Total Index of Completeness for the Sample Check List as a 
whole by summing the cell-indices of Completeness of all the cells of the Sample 
Check List. By an equivalent but preferable method we may divide the last- 
named result by the sum of the cell-indices of Completeness of all the cells of 
the Sample Check List Material (including cell n = 0); by this method the 
CZ, of the original formula cancels out and so may be disregarded throughout. 

A Total Index of Completeness is similarly obtainable for a Sample Check 
List (any Sample Check List selected from the Sample Cheek List) by summing 
the cell-indices of Completeness of the appropriate cells. Thus, if desired, a 
tabulation may be made showing Indices of Completeness for a scries of Sample 
Check Lists differing in extent. 

A combined tabulation may show for each of a series of Sample Check Lists 
its Index of Reliability and its Index of Completeness. 

More Complex Case 

So far we have considered a validation problem of simple type. In the writer's 
Check List Materials for Public School Building Specifications 5 a more complex 
problem was presented, due to the introduction of the concept of the Applicable 
Case. A Check List for School Building Specifications was developed with a 
view to its use by school officials or others as an aid in judging proposed school 
building specifications with reference to their completeness or incompleteness 
of determination. The position was taken that a new specification ought not 
to be charged with the omission of a given item unless the building (as repre- 

8 Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 



156 


LEE BYRNE 


sented by the specification) had an Applicable Case for that item. To give a 
single example, the Check List contains various items relating to the specifying 
of marble work. It did not seem appropriate to score a specification down for 
the omission of numerous determinations in marble work, if in fact there was no 
marble in the building to be determined. This situation is expressed by saying 
that there are no Applicable Cases for those items. 

It seems likely that there are other research problems in which the question 
ought to be raised whether adequate treatment does not require the introduction 
of the concept of the Applicable Case. If so a more difficult validation problem 
is presented than would otherwise be the case. 

In the more complex case indicated solution is obtained by making the neces¬ 
sary extensions in the procedures followed for the simple case. 


M (constant) 
D (variable) 

N (variable) 

m (constant) 
d (variable) 

n (variable) 


Modifications in Terms and Symbols 

total number of specifications in class 

number of these specifications containing an Applicable Case 
for a particular item 

number of the latter specifications wdiich also contain the 
particular item 

number of specifications in sample 

number of these specifications containing an Applicable Case 
for the particular item 

number of the latter specifications which also contain the 
particular item 


Values of d range from m to 0 by intervals of 1, and those of n range from d 
to 0 by intervals of 1. 

The convention is adopted that values of D range from M downward, and 

those of N from D downward, by intervals of —. 

m 

(Tabulation) cell will mean the number of items (or the group of items) having 
a common value of d and a common value of n. 

The criterion for membership in the Sample Check List may, for illustrative 

purposes, be taken as n ^ ^ . 

The criterion for membership in the Ideal Check List may, for illustrative 
purposes, be taken as N ^ 


Problem of Reliability 

Following the same principle and line of reasoning as for the simple case we 
arrive at the same general formula for the solution of an Ultimate Reliability 
Problem, viz.: 

PnV . 

XPp' 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


157 


Meanings of values of P and p are the same as before except that cells must be 
described respectively in terms of n and d values instead of n values alone, or 
N and D values instead of N values alone. 

Pr is evaluated in the same manner as before, using the new meaning of 
“cell.” 

For p 9 the evaluation now becomes 

C M— D flD- N riN 

__ m-d I'd-n^'n 
P siM 

which through cancellation may be simplified to the working formula 

71 __ c M ~ D C D ~ N C N 

The reasoning leading to the denominator is unchanged and so this de¬ 
nominator itself remains unchanged. The numerator for the evaluation of p is 
altered to the extent shown by the consideration that, in producing “favorable” 
ways, we now have to do with the number of simultaneous possibilities of draw¬ 
ing n specifications from a group of N specifications containing a particular item, 
drawing d — n specifications from a group of D — N specifications which con¬ 
tain an Applicable Case for this particular item but do not contain this item 
itself, and of drawing m — d specifications from a group of M — 1) specifications 
which contain no Applicable Case for the item. 

Problem of Completeness 

Following the same principles and line of reasoning as for the simple case we 
arrive at the following formula for the solution of an Ultimate Completeness 
Problem: 

C M—D siD—N SIN 
m-d ^ d ~n W 

By suitable treatment bringing about cancellations the working formula may 
be reduced to 

(>M-n S1D--N pN 
m— d L' d — n ^ n 

Techniques and Aids in Computation 

The present paper Is limited to an attempt to explain with adequate fullness 
the proposed theory of validation for derivative specifications and check lists, 
and space is lacking in which to exhibit techniques of actual computation. One 
specimen problem worked out in fairly complete detail, together with remarks 
on available aids in computation will be found in Appendix A3 in typewritten 
copies of the writer’s “Check List Materials for Public School Building Specifi¬ 
cations” on file in the Library of Teachers College, Columbia University; the 
Appendices are not included in the printed edition. 





A NOTE ON SHEPPARD’S CORRECTIONS 

By Solomon Rollback 

In this note we shall derive a simple relation between the characteristic 
function of the grouped distribution and the characteristic function of the 
original continuous distribution, assuming that the frequency curve has high 
contact with the x-axis at both ends. 

r , w 

f x * + 2 

If we set p 3 = f(x) dx , then the characteristic function of the 

/_ w 
2 

grouped distribution is given by 

(1) f(t) = YL eitXg v» 

where i = \/ — 1. Replacing p* by its value as given above, we have 

r x + w 

( 2 ) m = r' w 2 f( X )dx 

J x 8~ 2 

/ VI 

1 /(* + x ‘) dx 
2 

= dx ^ e*' 1 * f(x + x,) 

J- 2 - 

IP 

e 1 ' 1 */(x.) J* e-' ,x dx. 

There is no difficulty about justifying the inversion of the order of integration 
and summation. 

Because of the assumption of high-contact with the axis of x at both ends of 
the frequency curve, we have 

(3) <p(t) = J e itx f(x) dx = w ^2 e itXa f(x 9 ) 

so that 


158 


( 4 ) 



a note on sheppard’s corrections 


159 


This is the desired result, from which there follows the desired moment 
relations by equating coefficients of ( it) r on both sides of the equation. For 
example: 


1 + Jlf.il + "t <«>’ + ff W + 


= (‘ + 


(it ) 2 w 2 1 (it ) 1 w* 1 

3! H ltT~ 5! + 


(l + mil + (it) 11 +■■■') 

- 1 + mil + <4’ (» + g) + ( -£‘ (m + =^) + 


or 


M\ = 77ii ; M% = mi + = m 3 -]--— ; 


Washington, D. C. 



THE LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 1 


By J. L. Doob 

There have been many advances in the theory of probability in recent years, 
especially relating to its mathematical basis. Unfortunately, there appears to 
be no source readily available to the ordinary American statistician which 
sketches these results and shows their application to statistics. It is the purpose 
of this paper to define the basic concepts and state the basic theorems of prob¬ 
ability, and then, as an application, to find the? limiting distributions for large 
samples of a large class of statistics. One of these statistics is the tetrad differ¬ 
ence, which has been of much concern to psychologists. 

I 

Let F(x) be a monotone' non-decreasing function, continuous on the left, 
defined at (‘very point of the x-axis, and satisfying the conditions 

(1) lim F(x) = 0 , lim F(x) = 1. 

x—+ —oo X *30 

Then the function F{x) is said to bo the 1 distribution function of a chance variable 
x, and F(x) is said to be the probability that x < x. The curve y — F(x) is 
sometimes called the ogive in statistics. The chance variable x itself is merely 
the function x, taken in conjunction with tin 1 monotone function F(x). 

(*> 

If / X(JF(x) exists as an absolutely convergent Stieltjes integral, the value 
of the integral is called the expectation of x, and will be denoted by E(x). 

II 

Let F(x i, • • • , x n ) be a function defined over a-dimensional space, which is 
monotone, non-decreasing, continuous on tin 1 left in each coordinate if the others 
are held fast, and which satisfies the conditions 

(2) lim F{x i, • • • , x n ) = 0, j = 1, • • • , n, lim F(x u ■ ■ ■ ,x n ) = 1 

XJ-+-00 XI,” 

where in the last limit, Xj, • • • , x„ become infinite together. Then F(x lf • • • , x„) 
is said to be the distribution function of a set of chance variables Xj, • • ■ , x n , 
and F(x h • • • , x n ) is said to lx 1 the probability that all the inequalities x, < Xy, 
( j — 1, ••• , n), hold simultaneously. It can be shown that the function 
Fj(x) = lim (£ 1 , • • • i, x, 0 i>* of the type discussed in §1. The 

$1. . £ n 1 —»00 

1 Research under a Rrant-in-aid from the Carnegie Corporation. 

1(50 



LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 


161 


function Fj(x) is called the distribution function of x,. The chance variables 

n 

Xi, • • • , x* are called independent if F(x i, • • • , x n ) = H F/a*,). The chance 

i 

variables Xi, • • • , x„ are merely the functions Xi, • • • , x n defined over n- 
dimensional space, taken in conjunction with the function F(x i, • • . 9 x n ). 

If ai, • • • , a n are any real numbers, the number F{a u • • • , a„), the prob¬ 
ability that Xj < a n j = 1, • • • , n, is also called the probability that a sample 
On, • • • , x n ) shall bo in the region of ^-dimensional space determined by 
Xj < a ]f j = 1, • • • , n. Thus regions of this special type have probabilities 
attached to them. Using the usual additivity rules, probabilities can be at¬ 
tached to more general regions, and in fact probability can be defined on a col¬ 
lection C of regions including all open sets, closed sets and all sets which can 
be obtained from them by repeatedly taking sums, products, and complements. 
(Such point sets are called Borel measurable). The resulting function of point 
sets is non-negative and completely additive. 2 

Tf f(x \ 9 • • • , x n ) is any function of x if • • • , x n let E x be the set of points 
(xi, •. • , :r„) where / < x. Suppose that E x is in the collection C for all values 
of x y and let F(x) be the probability attached to the set E x . Then it is readily 
seen that F{x) has the properties discussed in §1 and is therefore the distribution 
function of a new chance variable x, which will be denoted by f(x i, • • • , x n ). 
The chance variable /(xi, • • • , x n ) is merely the function f(x t, • • • , x n ) taken 
m conjunction with the distribution function F(x h • • • , x n ). (An example is 
f(X ] 9 • • • , x n ) = .ri -j- • • • + x n , determining the chance variable Xi + •• • + *„.) 
Suppose that E(x) exists, 


CO 


E(x) - 



jrAF tr). 


r rhen it can be shown that the //-dimensional (hebesguc)-Stioltjos integral 


« /: 

exists and has the value E(x). Conversely the existence of the integral (4) im¬ 
plies that of (3). 

If there is a Lebesgue-intograble function <p(xi 9 • • ■ ,x tl ) such that 




2 That is, if p(E) is the value of the set function on the set E y and if Ei, Ei , 

/ 00 \ 00 

point sets \\ith no common points, and which are in (\ pi E m J — ^ p(E m ). 

\m 1 / m— 1 



162 


J. L. DOOB 


the function *p is said to be the density function of the distribution. In this 
case (4) becomes 

f(x 1 , • • • ,X n ) <P(X 1 , • • • , X n ) dXi • • • dx n . 

The probability attached to a point set E in the collection C is the integral 
(4) (or (4') if there is a density function), where / = 1 over E and / = 0 else¬ 
where. 



Ill 

Let x, Xi, X 2 , • • • be a sequence of chance variables. We suppose that for 
every integer w, x, x„ determine a bivariate distribution. Then it is readily 
seen from §11 that there Is a chance variable | x„ — x | and therefore that 
P(|x n — x| ^ X) 3 is defined for every number X. If 


( 6 ) 


lim P{|x n — x|^X}=l 


for every positive number X, the sequence x« is said to converge stochastically, 
or to converge in probability, to x. If a is a constant, P{| x n — a | ^ X} is also 
defined for every number X, and then* is a corresponding definition of stochastic 
convergence to a. The usual theorems about limits hold: if x M , y n converge 
stochastically to x, y, x„ + y* converges stochastically to x + y, etc. 

An example of stochastic convergence is given by the law of large numbers. 
Let x be a chance variable with distribution function F(x) and suppose that 
E(x), E(x 2 ) exist, i.e. that 

J°° xdF(x), j" xhlF(x) 

are absolutely convergent, integrals. Let xi, • • • , x„ be chance variables whose 

n 

n-variate distribution function is F(xj): we are thus supposing that the vari- 

j “ i 

ablcs all have the same distribution and form an independent set. Then 
1 * 

- y . Xj is a new chance variable, and TehebychefTs inequality furnishes an 
immediate proof that - ^ x, converges stochastically to /?(x). 4 


3 Throughout this paper, if y represents a set of conditions on chance variables, PM 
will denote the probability that those conditions are satisfied. 

1 n j 

4 If — V x„ E(x„) = E(x). E(x 2 n ) =* -P(x 2 ). Then if X is any positive num- 

n 7 «i n 

Ellx-Eix)}*] 

her P{| x„ — E(x) | > X| £ — —— which implies (6). 

aX 2 



LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 


163 


There is also another kind of convergence, called convergence with prob¬ 
ability 1. The sequence {x„) converges with probability 1 to x if 

(7) lim P{\ x„ - x | g X, 1 x, l+ i - x | g X, • • • , | x, l+ „ - x | g X} = 1 

ii —*cc 

for every value of 0, uniformly in p ^ 0 for every positive number X. If 
p = 0 in (7), (7) becomes (6), so that convergence with probability 1 implies 
stochastic convergence. Although the converse is not true, if jx„} is a sequence 
of chance variables converging stochastically to x, there is a subsequence of 
{x n \ which converges with probability 1 to x. & The usual limit theorems hold 
here also: if x„, y n converge with probability 1 to x, y, x, t -f y H converges with 
probability 1 to x + y, etc. 

An example of convergence with probability 1 is the following. If in the 
previous example the' hypothesis that. E (x 2 ) exists is removed, so that only the 
weaker hypothesis of the existence of E(x) is supposed, the Tchebyeheff in¬ 
equality can no longer bo applied, but a different method shows that * x y 

i -i 

converges with probability 1 (and therefore stochastically) to E(x)* This result 
is known as the strong law of large numbers. 

IV 

Letx, xi, x 2 , • • • be a sequence of chance variables with distribution functions 
F(x) } F\(x), Ft(x) } • • • respectively. Then if lim F n (x) = F(x) for every value 

n- **> 

of x y the distribution of x n is said to converge' to a limiting distribution with 
distribution function F(x). 

As an example, consider the Laplace-Liapounoff theorem. Let Xi, x 2 , • • • 
be a sequence of independent chance variables (i.e. any finite number of them 
form an independent set) with the same distribution functions, and let E(x n )> 
E(xl) exist. We suppose that a- — E{[x„ — E(x n ) J 2 } > 0 so that the dis¬ 
tribution of x„ is not merely confined to one point. Then the distribution of 

(8) n~ 2 £|x,- tf(x,)] 


5 The theories of probability and of measure are fundamentally identical. Chance 
variables correspond to measurable functions. Stochastic convergence corresponds to 
convergence in measure, and convergence with probability 1 corresponds to convergence 
almost everywhere. The relation between these two types of convergence is discussed 
(in the terminology of the measure theory) in E. W. Hobson, The Theory of Functions of 
a Real Variable, second edition Vol. 2, pp. 239-244. 

* Cf. for instance J. L. Doob, Transactions of the American Mathematical Society, 
Vol. 30 (1934), pp. 764-765. 



164 


J. L. DOOB 


converges to a limiting distribution with distribution function 7 

1 f x * 2 

( 9 ) — 4 = / e 

<ry/2 t J-*> 

The convergence of a sequence of n-variate distributions is defined as the 
convergence of the distribution functions just as above for n = 1. Suppose 
that (xu, - • • , x n i), (X 12 , • • • , x n2 ), • • • are independent sets of chance vari¬ 
ables (i.e. the distribution function of any finite number of sets is the product 
of the distribution functions of the sets) with the same distribution functions. 
We suppose that E(x ix ), E{x) x ) exist, j = 1, • • • , n and that a) = E{[Xj\ — 

1 m 

Efa i)] 2 } > 0. Then if x Jtn = m * [x M — E(x ]t )], the n-variate distribution 

t i 

of Xl mj * • • y Xnm converges to the normal distribution 8 9 about zero means with 
variances a-?, • • • , arl and correlation coefficients {p tJ \ where <r t <r ; p v = 2?{[xu — 
£(*.i)J fci ~ E{x 3 ,)]). 

Three lemmas will be needed below in applying these concepts. 

Lemma 1 . If {x n ) is a sequence of chance variables whose distributions approach 
a limiting distribution and if jy w } is a sequence of chance variables converging 
stochastically to 0, the sequence {x„y n | converges stochastically to 0. 

For if F(x) is the distribution function of the limiting distribution, and if X, 
n are any positive numbers, 

7 J {| x„y„ | <\| ^ P{| x„y„ | < X, | y„ | g y\ s P{\ x n \ < A/m, lyJ^Ml 

(10) ^P\ ly.lSp} -P{|x„|^x/ M ! =-P{|y„|>Ml -+- i J {|x„| < x/ M } 

g - P\\y„ | > y] + P{x>. < X/m } - P{x„ < — . 

Then, letting n become infinite, 

(11) lim inf P{ \ x n y n | < X} ^ F(\/y) — F( — \/2y) » 


Letting y approach 0, F(\/y) approaches 1, F( — \/2y) approaches 0, and the right 
hand side becomes 1, as was to be proved. 

Lemma 2. Let |x n }, {y*}, {z n } be sequences of chance variables such that the 
distribution of x n approaches a limiting distribution with continuous distribution 
function F(x) and such that the sequences (y n ), {z n J converge stochastically to 0, 
1 respectively. Then the distributions of {x n /z n | 10 and of x n + y„ approach limit¬ 
ing distributions with the same distribution function F(x). 

7 A. Khintchine, Ergebnisse der Mathematik, Y r ol 2, No. 4: Asymptotische Gesetze dor 
Wahrscheinlichkeitsrechnung, pp. 1-8. 

8 Ibid. pp. 11-16. 

9 If |a A | is a sequence of real numbers lim sup a n is defined as lim (least upper bound 

71 —*00 "—*00 

a„ y an+i, J, and lim inf a n is defined as —lim sup (— a,i). A necessary and sufficient 

n—*w 7i —*oo 

condition that the sequence jo«} converge to a limit a is that lim inf y n *■ lim sup a n — a. 

7t —*oo n—*oo 

10 Since z„ converges stochastically to 1, the probability that z„ = 0 approaches 0. The 
theorem is independent of the way x„/z„ is defined when z„ — 0. 



LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 


165 


Xn 1 ~ Z 

Since -- = x„ + x n -- (neglecting the possibility that z n may vanish), 

Zn Z n 

where the last term converges stochastically to 0 by Lemma 1, it is sufficient 
to prove the second part of the theorem. If e > 0, and if x is an arbitrary 
number, 

(12) Jpjxn + y» < -r} = P \Xn + y* < .r, | y n | S «} + P{x n + Yn < x,|y»| > c). 
Since the sequence {y„ J converges stochastically to 0, 

(13) lim P{X„ + Yn < X, I 7/ n | > ej g Hill P{| Yn | > c} = 0 

n -*oo n —>oo 

so that in the limit the second term in (12) can be neglected. Moreover 

(14) P|x„ + y« < x, | y„ | ^ ej g P(x„<z + . 

If we let n become infinite and then let t approach 0, (14) becomes 

(15) lim snp P(x„ + y„ < x} g F(x). 

n-+ oo 

A similar argument shows that 

(16) lim inf P{x„ + y„ < j} g F(x), 

71 *00 

and (15), (16) taken together imply that 

(17) lim P{x„ + y„ < a-} = F(x), 

n—* ao 

as was to be proved. 

Lemma 3 . If Xi, X2, X3, X4 are chance variables whose distribution has density 
function 

1 -1 (xfrxfr *!* *;> 

(2tt) 2 6 


the distribution of z = X 1 X 2 — x 3 x 4 has density function \e^ x 1 . 

The distribution of u = X 1 X 2 and that of v = — x 3 x 4 have the same density 
function: 



Hence the distribution of z has density function 


(19) 


(x 

2 r- 


_Xj_ 

2t - ' 


2 d\ 


dt dr 



166 


J. L. I)OOB 


If we change to polar coordinates: t — r cos 0, r = x sin 0, and integrate out X, 
we obtain 


1 

7r 




2 drdO = 



Theorem 1. Let Xi, x 2 , x 3 , X 4 determine a 4-variate distribution with distri¬ 
bution function F(x 1 , z 2 , x 3 , J 4 ). Suppose that E(x % ), E(x\), E(x\x)) exhsd, 
i, j = 1, • • • ,4, and suppose that E(x l ) = 0, Z?(x?) = l, 11 t, j = 1, 2, 3, 4. 
Let Xij, x 2y , x 3j , x 4; ^a/;c the same 4-variate distribution as Xi, x 2 , x 3 , x 4> j = 1, • • • , n, 

71 

and Zed the 4n-variate distribution function of |x„ j be II F(x iJ} x 2] , r-sjy x 4y ). We 

i-1 

shall use the following notation (which suppresses the dependence on n): 

(20) t, = - 2 j Xu- , s, y = - y] XaXji, p„ = E(x l x,). 

n k = 1 A — 1 

LcZ <p be a function of £ t , s t; , defined in a neighborhood N of P: £ t = 0, s tJ = p l; , 
which y together with its second partial derivatives is continuous in N. Define 
tr ^ 0 by 



where the partial derivatives are evaluated at P. Then if a > 0, the distribution of 
y/n [<p — <p(P)] (where <p has the arguments s 1; ) converges to a limiting distribu¬ 
tion which is normal with mean 0 and variance a 2 . 

To prove this theorem we expand <p in the neighborhood of P , obtaining 

4 4 

(22) Vn W - *(/>)] = 2 t e - 2 ?- Vn {f "‘ ~ S 'J + R " 

7=1 d * 1 t77-i dSlJ 

where the partial derivatives are evaluated at P , and where R* consists of a 
linear combination of Va£i(pA — s,*)> Vn(pu — s l; ) (p kl — s kl ) t with 

coefficients which are uniformly bounded as long as s, y arc in the neighbor¬ 
hood N. Now 

(23) lim = 0 lim s l7 = p X} 

H —>00 n -+00 

with probability 1, by the law of large numbers, and as n becomes infinite the 
distributions of y/n h, y/n (p* y — s„) converge to limiting distributions, by the 


11 The hypothesis that E(x t ) = 0 involves no real restrirtion, since the general esse can 
be reduced to this one by substituting*x t — E (x t ) for x. The hypothesis that E(x]) ~ 1 
can be met by substituting x t li£(x?)]~* whenever E(x x ) > 0, which will always be true un¬ 
less x* = 0 with probability 1. 



LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 


167 


Laplace-Liapounoff theorem. Then by Lemma 1, the terms of R n converge 
stochastically to 0. The other terms of y/n [<p — <p(P)] are sums to which the 
Laplace-Liapounoff theorem can be applied, giving the desired conclusion. 

As an example of the application of this theorem, we suppose that is a 
correlation coefficient: 


(24) 


<P = 


S]2 

(Sn S22)* 


V o{P) = p!2 • 


Here or 2 is P{[xix 2 — 3 -pi 2 (x 2 + x 2 )] 2 }, (which reduces to the familiar result 1 — p 2 2 
when the bivariate distribution of xi, x 2 is normal) and cr = 0 only when, with 
probability 1, 

(25) 2 XiX 2 = Pi 2 (x 2 + x 2 ). 


As a second example we suppose that <p is a tetrad difference: 


(26) 


S13S24 — S14S23 
(811822833814)^ 


<p(P) = P13P24 ~ P14P23. 


Here a 2 becomes 


(27) c 2 



P21X1X3 + P13X2X1 — pi 1X2X3 — P23X1X4 — 


<p(F) V 

" 2 H 


|J 


and a = 0 only when the quantity in the brackets vanishes with probability 1. 

If in either of the two above cases s tJ — £»£, is substituted for s tJ (i.e. if the 
deviations from the sample mean, not those from the true mean, are used), the 

result is unaltered. This is true in general, since ^, — are unaltered at P by 

ds tJ 

this substitution. 

There is a well-known 6-method used in statistics to find limiting variances 
of statistics of the type covered by Theorem l, 12 and Theorem 1 shows an 
interpretation which can be given to the results obtained by this method. 

We now investigate the necessary modification of Theorem 1 if a = 0, i.e. if 


(28) 


4 




- x,x,) = 0 


with probability 1. If we assume that y? has continuous third partial deriva¬ 
tives in the neighborhood N, we find that 


18 Examples of the use of this method can he found in T. L Kelley, Crossroads in The 
Mind of Man, Stanford University (192K), pp 49 50, and in an article by S. Wright, Annals 
of Mathematical Statistic^, Vol 5 (1934), p. 211. 



168 


J. h. DOOB 


(29) 


* - *"■» - i 2 m « + * 2 & - »> 


+ 2-/ ^ Av ^ S ’ ; ““ Pl ^ Skl ~ P«) + 

2 t .7T./ a ^ a - A/ 


where R* converges stochastically to 0. The second degree terms constitute 
a quadratic form in {&, s y * — p^j • Now the multivariate distribution of 
y/n(s } k — p,*)h by the Laplacc-Liapounoff theorem, converges to a 
normal distribution whose variances and correlation coefficients are those of 
x f , XjX k . The distribution of n[<p — <p(P)] thus converges to the distribution 
of the quadratic form 


/ Qn \ n NT> a 2 <p . U V s , n V s <*V 

(30) /, a » a J + n 2-J ~Xt~ a a »LV + o Z_J ^ 

2 af t af, 2 f^ k d&dSjk 2 t ds tJ ds u 


?t; $kl f 


where j a X) fak] have the multivariate distribution just described, unless the 
quadratic form vanishes identically. This reasoning can be continued, the 
general result being that there is some power v of n, if <p is sufficiently regular, 
such that the distribution of n v [<p — <p(P)] converges to a limiting distribution. 

When <r = 0 in the second example, unless the distribution of Xi, x 2 , x 3 , x 4 
is confined with probability l to a 4-dimensional quadric, pi 3 = pi 4 = p 23 = 
p 2 4 = 0 . Equation (29) becomes 


(29') n[<p —- <p(P)] = S 13 S 24 — S 14 S 23 + R n • 


Now if Xi, x 2 are transformed by a linear homogeneous transformation with 
determinant A, it is readily seen that Si 3 s 24 — S 14 S 23 is multiplied by A. The 
same is true of x 3 , x 4 . If Xi, x 2 are transformed into x[, Xo so that E(x'*) = 1, 
E (j J.r 2 ) = 0 , the determinant of the transformation is ±(1 — p? 2 )~‘ l - Then 
transforming each pair (xj, x 2 ), (x 3 , x 4 ) in this way into (xj, x 2 ), (x 3 , x 4 ), the 


variables x(, 


X 4 are uncorrelated. If s' tJ = 


1 v / / 
2, x ** x ^> 
71 \ = 1 


(31) 


S13S2 


S 14 S 23 ~ " 


®13 S 24 ®14 ®23 


±(1 - p? s )Kl - pL) ! 


The limiting distribution of sj 3 s 24 — s! 4 s 2 3 is the distribution of 3 13 3 2 4 — ffi 4 5 2 3 
where these four chance variables are normally distributed, E($' l3 ) = E(^' 2A ) = 

tffou) = mu) = 0, js®:,) = EAx\x\\E^ kl ) = £( x ;. x ; x ;x;). now if 
Xi, x 2 , x 3 , x 4 are normally distributed—the most important case for statistical 
purposes— x[ , X 2 , x 3 , x 4 will also be distributed normally, and the vanishing of 
the correlation coefficients means that the chance variables are independent. 
If this is true 


(32) 


*(0 = 1 


tf(M«) = 


(5t; ^ ?«)• 




LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 


160 


Evidently, however, x|, x 2 , x 3 , x 4 do not have to be independent to make these 
equations valid. It is more than sufficient if the pairs (xi, x 2 ), (x 3 , x 4 ) and there¬ 
fore the pairs (xj, x 2 ), (X3, x 4 ) are independent. If (32) is true, the jS’s are in¬ 
dependent, each one being normally distributed with mean 0 and variance 1. 
Summarizing these results, and using Lemma 3: if <p is the tetrad difference and 
if p 13 = p 14 = p 23 = P24 = 0, the distribution of n[<p — <p(P)] converges to a limiting 
distribution . If in addition the distribution of Xi, x 2 , x 3 , X4 is normal , or if the 
pairs (xi, x 2 ) (x 3 , x 4 ) are independent , this limiting distribution has density function 

c e ~c\x\ 

2 

where c = (1 — p? 2 )~ J (1 — pt 4 )~* . 

Wilks has investigated the case where Xi, x 2 , x 3 , x 4 are normally and inde¬ 
pendently distributed, and in this case found the exact variance of the tetrad 
difference as a function of n. ls 

Columbia University 


13 Proceedings of the National Academy of Sciences, Vol 18, (19)12), pp 562-565. 




ON THE POSTULATE OF THE ARITHMETIC MEAN 

By Richmond T. Zoch 

Introduction 

Suppose n observations have been made of an unknown quantity. It is de¬ 
sired to know the most probable value of the unknown. When Gauss gave his 
development of the so-called Normal Law of Error , he assumed that the Arithmetic 
Mean of the n observations is the most probable value . The question arises: Can 
this postulate be justified? 

In the excellent book, entitled “Calculus of Observations,” by Whittaker and 
Robinson 1 there is given a proof which purports to deduce the postulate of the 
Arithmetic Mean from assumptions of a more elementary nature. This proof 
is not correct. 

Since this book has had wide circulation, it is believed that the errors in this 
proof should be called to the attention of the users of the book. The present 
paper has been prepared for this purpose. The first part of this paper points 
out the questionable features of the proof given in Whittaker and Robinson's 
book. The second part gives some critical comments on the original sources 
from which Whittaker and Robinson obtained their proof. 

Part 1 

The assumptions on which Whittaker and Robinson based their proof of the 
postulate of the Arithmetic Mean are: 

Axiom I. The differences between the most probable value and the indi¬ 
vidual measures do not depend on the position of the null-point from which 
they are reckoned. 

Axiom II. The ratio of the most probable value to any individual measure 
does not depend on tin; unit in terms of which the measures are reckoned. 

Axiom III. The most probable value is independent of the order in which the 
measurements are made, and so is a symmetric function of the measures. 

Axiom IV. The most probable value, regarded as a function of the individual 
measures, has one-valued and continuous first derivatives with respect to them. 

It is fairly easy to show that if the Arithmetic Mean is the most probable 
value, then the above four axioms follow as conclusions. The converse, viz. if 
the above four axioms be assumed then the Arithmetic Mean is the most prob¬ 
able value, however, is not true. That is to say the above assumptions are 

1 The CfUculus of Observations by E. T. Whittaker and G. Robinson, Blackie & Son, Ltd., 
London (1929), pp. 215-217. 


171 



172 


RICHMOND T. ZOCH 


necessary conditions, but not sufficient conditions. For, consider the following 
function of the measures: 


M3 

M2 


1 Z Or, - *)* 

n t -i 

1 *-n 

~ Z - ^) 2 

n rTj 


where x is the Arithmetic Mean of the x x . 

Clearly this function is a symmetric function of the measures (x x ) and there¬ 
fore satisfies Axiom III. If the x x are each multiplied by k then the Arithmetic 
Mean (x) is also multiplied by k and we have 

1 XI - kx) z 

* t- 1_ = M3 ; 

- XI ( kx x — kx) 2 

n rrj1 


that is to say, if we multiply the individual measures by A; it is the same as multi¬ 
plying the function — by k and therefore the ratio of any individual measure 
M2 

to the most probable value (function) does not depend on the unit used. Hence 

the function — satisfies Axiom II. 

M2 

The partial derivative of — with respect to x\ is 

M2 


((!<*-*>*)[3|s <*-«•){- s} 


+ 3(#i — x) 2 


dxx 

dXi 


- (s - *>'} _ 2 {s <*. -*>}{- s} +2<xi -*> §) 



*) 2 f = 


•W(*i — i) 2 — Mai — 2#ia0ri — x) 


_ 2 
nn 2 


since — = -, and XI 0&»* — x) = 0. The partial derivatives of — with respect 
ctei n *-i M 2 

to each of the x* are of the same literal form and clearly these partial derivatives 

are single valued and continuous. Therefore the function ~ satisfies Axiom IV. 

M 2 

Now it can be shown that if h be added to each x x , then the function — is 

M2 

unchanged and hence this function does not satisfy Axiom I. (It should be 

noted that the function — is invariant under the transformation specified by 

M2 



POSTULATE OF THE ARITHMETIC MEAN 


173 


Axiom I.) However, consider the function x + a — ss /, where a is a constant 

M2 

independent of the x*. Clearly, / satisfies all of the four axioms. 

Thus a function, distinct from the Arithmetic Mean, has here been exhibited 
which satisfies the four axioms given in Whittaker and Robinson’s book. Hence, 
these four axioms are not sufficient to establish the postulate of the Arithmetic 
Mean. The question arises: Where is the proof given by Whittaker and Robin¬ 
son lacking in rigor? The proof given is essentially as follows. (No part of the 
proof given by Whittaker and Robinson is here omitted; in fact, for the sake of 
rigor and careful reasoning, further explanations are given and the various steps 
are numbered.) 

(1) Suppose the most probable value is expressed in terms of the n measures 
Xi, x 2y • • • , x n by the function <f>(x i, x 2y • • • , x n ); that is to say the most probable 
value is some function, 0, of the observations, or: the most probable value 

= <f)(X\y X 2y * * ’ , Xn). 

(2) By the theorem of the mean value in the differential calculus, which by 
Axiom IV is applicable, we have (j>(kx i, kx 2y • • • , kx n ) = 

*(°’ °»'''» 0) + kxi [toj + ■•■ + kx " [|£] * 


where the square brackets denote that every x t is to be replaced by Okx* where 6 
lies between 0 and 1. 

(3) By Axiom II, the left hand side = k<t>(x i, x 2 , • • - , x n ). 

(4) By the continuity of 0, postulated in Axiom IV the equation 
<t>(kx i, kx 2 , • • • , kx n ) = k<t>(xi, x 2) • • • , x n ) must hold in the limit when k is 0, 
that is 0(0, 0, • • • , 0) = 0. 

(5) We now have 

k<t>(x 1 , X 2y • . - , Xn) = kx 1 + • • • + kxn 

or on dividing by /c, 



<t>(x 1, X 2y • - - , Xn) = Xi 




(6) In this last equation let A: —> 0: then each of the quantities 


— tends 

_dXi_ 


to a value which is independent of the x’s and we can write <t>(x i, x 2y * • * , x n ) = 
C\Xi + • • • + c n x n where the c’s are independent of the x’s. 

(7) By Axiom III the c’s must all be equal, so 


4>(x 1, X 2y • - • , Xn) = c(x 1 + X 2 + - . • + Xn) • 


(8) From Axiom I we have 

<t>(x i + A, x 2 + h y •. • , x n + h) = <t>{x i, x 2y • • - , Xn) + h . 



174 


RICHMOND T. ZOCH 


(9) If in this last equation we let the Xi all approach zero then we have cnh = h 

and therefore c = - and finally 
n 


<t>(x i, x 2 , • • • , x n ) = - (xi + x 2 + • • • + ^n) 

n 

which states that s the most probable value = the Arithmetic Mean. 

It should be noted that the first six steps involve only Axioms II and IV. Of 
these first six steps the second and sixth are questionable. 

The sixth step involves the tacit assumption that the partial derivatives are 
functions of k. These partial derivatives are not necessarily functions of k and 

the example given above, viz, / = x + a — is a function whose partial deriva- 

M2 

tives are independent of A;; in fact no function of the form 

i “ n 

,.» 2 (x * ~ 

F — * + a > 7=TT 

S ( x * ~ x )' _1 

t * 1 

will satisfy the tacit assumption involved in the sixth step; nor is F the most gen¬ 
eral function which will not satisfy the tacit assumption, thus take for example 


P m X + 


anznt 


hi u 2 m 4 + cm 3 


Consider now the second step. Take the function <£(?/i, 2 / 2 , • • • , y*) = 
k<f>(x 1 , x 2y • • • , x n ). Then, by Axiom II, we have y x = kx t . Apply the Theorem 
of the Mean Value to instead of <t>(x t ). Then 4>(yi, 2 / 2 , • • • , y n ) = 


<t> (0, 0, • • • ,0) + 2/1 


“cty 


+ 


we obtain the equation given in the second step except that the square brackets 
f d<t>(kx 1 , kx 2 , • • • , kx n 


are now of the form 


L' 


n I “ I* 

L9y n J 

ep e 


Now if we replace y* by kxi 


and not 


d(kx % ) 

Whittaker and Robinson. It is difficult to decide whether by 
and Robinson mean 

d<t>(x 1 , X 2 , • • • , X n ) 


_dXi_ 


as given by 


‘cty‘ 


Whittaker 


d<t>(kx 1 , kx 2} • • • , fcx n ) 0 




These last two expressions are not equal. To make the second step more clear 
it is necessary to demonstrate that 

[ d(f)(kx 1 , kx 2} • • • , &£n)l _ T0</>(x 1 , ^ 2 , • • * , Zn)l 

a(far.) ' J _ L ax, J' 



POSTULATE OF THE ARITHMETIC MEAN 


175 


and this has not been done. In order to demonstrate this equality further use 
must be made of Axiom II. It appears that the questionable features of the 
second step may be overcome by starting with the equation implied by Axiom 
II, thus 


<t>(kx i, kx 2 , • • • , kx n ) = k<t>(x h x 2 , • • • , x n ); 


in other words <t> is a homogeneous function of degree 1. Therefore use can be 
made of Euler’s Theorem on homogeneous forms. In this way we obtain: 


t =* n 



which is an abbreviation of the last equation given in the fifth step. 
Now, making further use of Axiom II we have: 


d<t>(kxi, kx 2l • • • , kx n ) 
d(kx t ) 


d(kx t ) 


1 d 

k<j>(x h X 2 , ■ • • , Xn) = <t>(x 1, Xj, • • • , Xn) . 


It follows that 


d4>{x i, x 2 , • ■ • , x n ) __ d(f> {kx i, kx 2 , • ■ ■ , kx n ) 

dXi d(kXi) 


From this development we conclude that for any function whatever which satis¬ 
fies Axiom II the last equation of the fifth step cannot possibly involve k . 

In order to overcome the defect in the sixth step it is necessary to make a more 
restrictive assumption. If in place of Axiom IV, we assume that “The most 
probable value , regarded as a function of the individual measures , has first partial 
derivatives with respect to them which are constant,” then the equation given in the 
sixth step can be rigorously established. 

After the equation of the sixth step is rigorously established there remains an 
objection in the seventh step. The axioms do not explicitly state that the n 
observations must be functionally independent. Therefore suppose the x t are 
functionally dependent according to the relation x t = y x z where the y t are all 

constant. Then the function / == x + ~ will have partial derivatives with 

M2 

respect to the x x which are unequal and constant; yet at the same time the 
function / is a symmetrical expression of the n variables. 

Hence in order to establish the postulate of the Arithmetic Mean along the 
lines followed by Whittaker and Robinson it is necessary to make another restric¬ 
tive assumption slightly different from that proposed in the last paragraph but 
one, and assume (in addition to Axioms I and II) that the function has partial 
derivatives with respect to the x t which are equal. 



176 


RICHMOND T. ZOCH 


Part 2 

The first original paper consulted was one by Schiaparelli. 2 In this paper nine 
propositions are presented four of which are also called lemmas. From a strict 
mathematical point of view the four propositions which Schiaparelli calls lemmas 
are really postulates. Schiaparelli discusses these four lemmas at length; three 
of these lemmas are the first three axioms given in Whittaker and Robinson’s 
book. The fourth one is: “When, in the function <f>, all the variables (x%) take 
the same value a, the function itself becomes equal to a,” (This, as a matter of 
fact, is the definition of an average). 

In his discussion of these lemmas, which are based partly on practical and 
partly on philosophical grounds, Schiaparelli points out that they are justified 
from the practical or statistical nature of the problem involved in arriving at the 
most probable value (Schiaparelli uses the term “true value”) of a set of obser¬ 
vations. In the present writer’s opinion, these discussions are the most excel¬ 
lent part of Schiaparelli’s paper. These discussions are even more significant in 
view of the fact that the later writers on this subject make no attempt whatso¬ 
ever to justify the use of their postulates. 

Schiaparelli remarks that we should have no reason for not expecting that a 
small change in a single observation should produce a small change in the func¬ 
tion 0; but he does not make this remark in the form of an explicit postulate. 
This could have been done and, moreover, such a postulate of continuity could 
be justified from the practical nature of the problem. It seems that a more 
elegant procedure would have been to deduce the continuity of the function and 
its derivatives from Axioms I and II. It will be shown later that this is possible. 
From his remark on the continuity of the function, Schiaparelli concludes that 
the partial derivatives of <£ with respect to the Xi exist and are continuous. His 
method of arriving at this conclusion is not valid, for it is well known that an 
arbitrarily assumed function may be everywhere continuous and yet possess a 
derivative at no point. 

Schiaparelli’s Proposition III states: “When in the function <£ all the Xi take 

oj. 

the same value, then the — become equal to each other.” This Proposition is 

dX{ 

false. To show this, consider the function 

where the 


df 1 3m2 [( x . — x ) 2 — ^2] — 2^3(21 — x ) 

dx t ~ n + nul 

2 Giovanni Schiaparelli—Come si possa giustificare 1’uso della media aritmetica nel cal- 
colo die risultati d’osservazione, Rendiconti Reale Instituto Lombardo di Scienze e lettere, 
Vol. XL (1907), pp 752-764. 



POSTULATE OF THE ARITHMETIC MEAN 


177 


Now, when the x x all approach a then both / and — become indeterminate 

dXi 

forms. However, in this case / takes an indeterminate form which can be 

evaluated and it can be shown that -- will always have the value zero, i.e., / 

M2 

will have the value a when all the x x = a; while the — can take any value 

dXi 

whatever and in general the — will not be equal when the x x —► a. To illus- 

dXi 

trate: Consider the observations y\ — 1, y 2 = 3, yz = 4 then y = 8/3 and 
M 2 = 14/9 and M 3 = —20/27 whence / = 8/3-10/21. Now assume that these 
three observations all approach 2 in a certain way, i.e., let x x = 2 4- (y x — 2 )z. 
Then x = 2 + (# — 2)z = 2 + (2/3)2. 


M2 (X,0 = z 2 1 S ( 2 /, - yY = (14/9)2* 
n 

and 

M 3 (x.) = z 3 - 2 (i/ t - 2 /) 3 = ( —20/27)z* 
n 

whence / = 2 + (2/3)z — (10/21)s. Clearly as z —> 0 the x x — > 2 and / —> 2. 
However, 

= 1 , 131 

dXlJ*,—2+(l/r-2)x 3 294 

dfl = 1 _ 253 

9x2_Jx x —24-Cvj—2)* 3 294 

] = 1 122 

8 X 3 2-f- (i/j— 2 )z 3 294 


Thus the — are not functions of z and as the x x —» 2 the — remain constant 
dx t dx t 

and unequal. 

From his conclusion that the derivatives of <f> exist and from Axiom I, Schia- 

parelli obtains the equation, — = 1, (this equation being his Proposition 

V) in the following way: Since the derivatives of <t> exist, then by the Theorem 
of the mean value, 


<f>{x i -j- h y x% -f" hf Xs h f • • • , x n -f- h) 

= <K*i, • • • , *») + h (H + g + • • • + |£) • (A) 

By Axiom I: 

+ h) %2 “H k) * * * i “j" ^) — $(p&h • * * p %n) 4~ ^ • 



178 


RICHMOND T. ZOCH 


* m n n j 

Whence — = 1. Now this equation is correct but the above proof of it 

t«l OX{ 

is not convincing. Clearly, according to the Theorem of the mean value, in 
equation (A) it is necessary to replace each Xi in the — by 6x x where 6 is 

dXi 

between 0 and 1. 


31 

Schiaparelli’s Proposition VII states in effect that the — are invariant under 

dXi 

the transformation x[ = x t + h where h is constant, and his Proposition IX 
states that the — are invariant under the transformation x[ = kx x where k is 

dXi 

a constant. These two propositions are correct and are correctly established. 
Making use of his Propositions III (which is false), V, VII and IX, Schiaparelli 
proceeds to the establishment of the postulate of the Arithmetic Mean, as 
follows: 

Let a = As the x x vary, then a varies but for a particular set of x t 

then a is a constant. Now by Axiom I we have 

a + (m — 1) a = <t>(x i + (m — l)a, + (m — l)a, • • • , x n + (m — l)a) — m a 


for all values of m > 1. Then by Axiom II: 

(x\ + (m — 1) a x 2 + (m — 1) a x n + (m — 1) 


a = <t>(^ 


m 


m 


m 


-) 


( Xl - a x 2 - a x n - q \ 

= <t> [ - h a,-!-«,•••,-b a ) • 

\ m m ?n / 


31 

And by Propositions VII and IX, the — are unchanged during the above trans- 

dXi 

formations. Hence the last equation is true when rn —> °o and by Proposition 
dd> 1 

III (false) the — = - as when m —> oo, </>(x t ) = a. In this final proof Schia- 
dXi n 

parelli gives a geometric illustration of each step. 

It is both interesting and strange to know that in dosing his paper Schia¬ 
parelli does not claim that the Arithmetic Mean is the only function which 
will satisfy all of his postulates. In fact he himself points out that the func- 

i = n 

tion 0, implicitly defined by the equation X) (0 — x *) m = 0 where m is an 

»*= i 

odd integer > 1 will satisfy all of his postulates. Furthermore he points out 
that this function will not satisfy his Proposition III. Schiaparelli’s object 
was to establish the postulate of the Arithmetic Mean without any appeal to 
the concept of probability. To accomplish this he made four assumptions each 
of which he justified by a priori reasoning. Then he proceeded with the above 
proof. Why he should have been satisfied with his own proof after perceiving 

the function defined by (<t> — x%) m — 0 is hard to understand. 



POSTULATE OF THE ARITHMETIC MEAN 


179 


The second paper 3 consulted was also by Schiaparelli. It is merely an 
abridged form of the one just discussed. Schiaparelli wrote two earlier papers on 
this same subject (altogether Schiaparelli wrote four papers on it) but it was 
inferred from the footnotes in his paper, which has just been discussed at length, 
that it contained all of the material of the two earlier papers with which he him¬ 
self was satisfied. Therefore Schiaparellis two earlier papers were not con¬ 
sulted. 

The third paper consulted was that by Broggi. 4 Broggi states that the pur¬ 
pose of his paper is to establish the postulate of the Arithmetic Mean by purely 
analytic methods which are more brief than Schiaparelli’s method. Broggi 
words the assumptions upon which he bases his proof as follows: 

1. 0 is a symmetric function of its n variables; 

2. The partial derivatives are single-valued and finite; 

3. We have <t>(kx u kx 2 , • • • , kx n ) = A*0(.ri, x 2) • • • , x n ); 

4. We have <t>(x t + h, x 2 + h, • • • , x„ + h) = <t>(x i, x 2 , • * • , x n ) -f h> that is 
to say for 2: 


dXi dx 2 


dr n 


(a) 


Broggi does not explain why he used the j>ostulate 2 but presumably it was in 


order to exclude the function defined by (</> — x t ) m = 0. Consider the 

special case where m = 3. Then n0 3 — 30 2 2a* t + 30 Zx 2 t — Sz® = 0. Let 

p = 3 (- — x 2 ) and q = Xx 2 t — 2.r 3 — i 2x*. Also put R = 

\n / n n 

(p/3) 3 -f (<//2) 2 and let A be the real cube root of — q /2 + y/R and B be the 

real cube root of — q/2 — y/R. Then the three branches of 0 can be explicitly 

written 


0 i = A -j- B -f- x 

< f ) 2 = CO A -j- (0~B -j - ^ 

03 = co*A coB -j - x 

where co and co 2 are the two complex cube roots of unity. Now while 0 does not 
satisfy the postulate that the function be single valued, 0i satisfies this postulate 
as well as all the others and so does 02 and also 0 3 . Hence, Broggi’s failure to 

i — n 

comment at length on the function 23 (0 — x t ) m — 0 is unsatisfying. As a 

i — i 

matter of fact Broggi fails to point out any of the defects of Schiaparelli’s 

3 Giovanni Schiaparelli—Come si possa giustificare l’uso della media aritmetica nel 
calcolo delle rnisure, senza fare aleuna ipotesi sulla legge di probability degli errori acci¬ 
dental!, Astronomische Nachrichten, Band 176 (1907) pp. 206-212. 

4 Ugo Broggi—Sur Le Principe De La Moyennc Arithmetique, L’Enseignement Matlie- 
matique, XI (1909) pp. 14-17. 



180 


RICHMOND T. ZOCH 


paper, with the possible exception that he shows Schiaparelli’s postulate which 
states 0 = a when each of the Xi = a to be a consequence of Axioms I and II. 
This is done so casually that it makes one wonder whether Broggi really was 
aware of the fact that Schiaparelli’s postulates are not independent. 

Broggi proves the Lemma: “A homogeneous function of the first degree which 
is a solution of the equation of partial derivatives (a) is an integral function.” 
This Lemma is correct and is correctly proved but its wording is apt to be mis¬ 
leading; in fact it appears that its true meaning was not clear to Broggi himself. 

For, while the function 0 cannot be of the form — where 0 is a homogeneous 

X 

function of the p th degree which satisfies Axiom I and x a homogeneous func¬ 
tion of the (p — l) th degree which also satisfies Axiom I, the Lemma does not 

mean and Broggi has not proved that 0 cannot be of the form 0 = U + - where 

x 

fi is an integral function satisfying Axioms I and II and 0 and x are homogene¬ 
ous functions of the p th and (p — l) th degrees respectively which are invariant 
under the transformation specified in Axiom I. By reason of this oversight, 
Broggi concludes that any function satisfying Axioms I and II must be linear 
in its n variables, a conclusion which is erroneous. 

The fourth paper consulted was that by Schimmack. 6 Schimmack’s paper is 
in three sections. The first section contains the proof which is essentially that 
which Whittaker and Robinson give. In the second section Schimmack gives a 
different proof, from a set of new postulates. The new set of postulates is: 

Axiom I' = Axiom I. 

Axiom II'—The most probable value is independent of the sense of direction 
of the scale upon which the observed values (and the most probable value) are 
reckoned, that is to say, 

— X 2 , • • • , —X n ) = -0(.n, • • • , X n ). 

Axiom III' = Axiom III. 

Axiom IV'—If from n observed values, the most probable value be computed 
and if one obtains an additional observed value then the most probable value of 
the n + 1 observed values is the same as the most probable value of n + 1 
quantities consisting of the initial most probable value counted n times and the 
(ft + l) th observed value, namely: 

0n+ l(#lj * * * » #n-fl) = 0n-t-l(0nj • • • , 0n> ^n+l)* 

In explaining the object of this second section, Schimmack says that postulat¬ 
ing the existence of the derivatives (Axiom IV) seems unjustified and ought to 
be avoided and only such axioms made which the intrinsic character of the prob¬ 
lem justifies. In connection with this statement of Schimmack’s it appears that 
the intrinsic character of the problem certainly does not justify Axiom IV'. In 

6 Rudolf Schimmack—Der Satz vom arithmetisehen Mittel in axiomatischer Begriin- 
dung, Mathematische Annalen, Band 68 (1909) pp. 125-132, 304. 



POSTULATE OF THE ARITHMETIC MEAN 


181 


fact, Axiom IV' appears to be quite artificial. Moreover, Schimmack does not 
attempt to justify Axiom IV' by a priori reasoning as Schiaparelli does for 
Axioms I, II, and III. While, if the Arithmetic Mean is the most probable 
value, Axiom IV' follows, since it is a property of the Arithmetic Mean, it does 
not seem to be in keeping with the intrinsic character of the problem to use this 
property as a starting point for later deductions. 

As regards Schimmack's objections to Axiom IV, all of the conditions specified 
by it can be deduced from the first two Axioms except that the derivatives must 
be single-valued. To show that this is true, consider an arbitrary function 
which satisfies Axioms I and II. Let this function be 0(xi, x 2 , • • • , x n ). We do 
not know that 0 is continuous or that 0 has any derivatives. All we assume is 
that 0 satisfies the first three Axioms and it is here proven that 0 must be con¬ 
tinuous and have continuous partial derivatives. By Axiom I we can give 
increments to the x t ; hence we give each x t the same increment, Ax , and then 
subtract 0 and we have: 0(xi + Ax , x 2 + Ax, • • • , x n + Ax) — 0(x 1 , x 2 , • • • , x n ) = 

A0 but by Axiom I, A0 = Ax. Therefore ^ = 1 = In other words, the 

Ax dx 

total derivative of 0 exists and is constant. Therefore the total derivative of 
0 is continuous. But since the total derivative exists, all of the partial deriva¬ 
tives exist. By Axiom II, 0 is a homogeneous function of the first degree. 

r\ j n j 

Applying Euler\s Theorem for homogeneous forms, we have 0 = Xi-h # 2 —* 

dx\ dx* 


+ 


+x a* 

+ ” dx n 


Since the total derivative of 0 is everywhere continuous, 


0 is also everywhere continuous. Thus, the right hand side of the above equa¬ 
tion is everywhere continuous and each partial derivative is therefore everywhere 
continuous. 


31 

As regards that part of Axiom IV which requires the — to be single valued, 

dx i 

it would seem more satisfactory to postulate that the function 0 is single-valued, 
for the single-valuedness of a derivative does not insure the single-valuedness 
of the integral while the single-valuedness of a function does insure the single- 
valuedness of the derivative where the derivative exists. 

In the third section of his paper, Schimmack shows Axioms I, II, III, and IV 
to be independent, and likewise Axioms T, II', III and IV'. 

Schimmack does not mention any of the questionable features of Schiaparelli's 
and Broggi's papers. 

The fifth paper consulted was that by Suto. 6 Suto's assumptions are: 

1°. 0(x, x, • • • , x) — x (This is Schiaparelli's). 

2°. 0(xi + y h x 2 + y 2y • • • , x n + y n ) - <t> (*i, **, • • • > £») depends on the 
values of yi, y 2j • • • , y n only. 

3°. = Axiom III = Axiom III'. 


# Onosaburo Suto—Law of the Arithmetical Mean, Tohoku Mathematical Journal, Vol. 
6 (1914) pp. 79-81. 



182 


RICHMOND T. ZOCH 


Suto says he believes these assumptions to be more simple and natural than 
Schimmack’s Axioms I-IV'. However, assumption 2° appears to be quite 
artificial and very restrictive. Suto does not even attempt to justify it by a 
priori reasoning. 

Suto shows his three Axioms to be independent. It is interesting to know that 
Suto has established the postulate of the Arithmetic Mean rigorously using only 
three postulates while Schiaparelli, Broggi and Schimmack failed using four 
postulates. In this connection it should be observed that when Axiom IV as 
given by Whittaker and Robinson is replaced by “The most probable value, 
regarded as a function of the individual measures, has first partial derivatives 
with respect to them which are equal” as suggested at the end of Part 1, then 
Axiom III can be deduced as a consequence of Axioms I, II and the reworded 
Axiom IV, so that three Axioms only are sufficient to deduce the postulate of the 
Arithmetic Mean. However, it would be difficult to justify the reworded Axiom 
IV from the nature of this problem of the Arithmetic Mean. 

Suto does not point out any of the defects of the preceding papers. 

The last paper consulted was that by Beetle. 7 It deals with the third section 
of Schimmack J s paper. Beetle also fails to point out any of the defects of the 
preceding papers. 

Conclusion 

The postulate of the Arithmetic Mean can be rigorously established, without 
the use of the concept of probability, if sufficiently restrictive assumptions are 
made. The writers making sufficiently restrictive assumptions have failed to 
justify the use of them. Several proofs of the postulate of the Arithmetic 
Mean are clearly erroneous. The existing attempts to establish the postulate of 
the Arithmetic Mean without any appeal to the concept of probability are, 
therefore, unsatisfactory. 


Acknowledgment 

This paper was prepared under the direction of Dr. Frank M. Weida of the 
George Washington University. 

The George Washington University, 

Washington, D. C. 

7 R. D. Beetle—On the complete independence of Schimmack’s postulates for the Arith¬ 
metic Mean, Mathematische Annalen, Band 76 (1915) pp. 444-446. 



THE SHRINKAGE OF THE BROWN-SPEARMAN PROPHECY 

FORMULA 

By Robert J. Wherry 


At the recent meeting of the Conference on Individual Psychological Differ¬ 
ences held in Washington, Dr. Clark Hull of Yale University called attention to 
the fact that the much used Brown-Spearman formula involves, or leads to, if 
used without regard to certain limitations, a certain over optimism. 1 In other 
words, if only this formula is taken into account, one would assume that the mere 
increasing in length of a test would automatically and, with continued increases 
in length, indefinitely continue to increase its reliability or validity. 

On the other hand, we know that the greater the number of test units the 
greater the shrinkage between the predicted and actually obtained value. At 
least we know this to be true when the value in question is a multiple correlation 
coefficient and the test units are independent variables. Hull raised the question 
as to whether or not the same fact might be true of the figures predicted by the 
Brown-Spearman formula. It is the purpose of this article to show that this 
shrinkage does occur, and that the Wherry-Smith shrinkage formula 2 satisfac¬ 
torily predicts this shrinkage. 

A quick review of the nature of the two formulae (the Brown-Spearman and 
the Wherry-Smith formulae) will at once show the importance of the discussion. 
The Brown-Spearman formula, as applied to the predicting of reliability, reads 
as follows, 


R = 


M r n 

1 + (M - I) r u ’ 


( 1 ) 


where R = the predicted reliability, 
m = the discovered reliability, 

and M = the number of times the test is lengthened. Thus the test provides 
that the predicted reliability ( R ) will increase with each increase in M, but it is 
to be noted that the increase in R decreases with each increase in M as the value 
of R approaches its limit of plus one. 

On the other hand the Wherry-Smith formula, which reads, 


& « 


(N - 1)R 2 — (AT — 1) 
N - M 


( 2 ) 


where R = 
R = 
M = 


the predicted value of the correlation, 

the discovered correlation, 

the number of independent variables 


183 



184 


ROBERT J. WHERRY 


and N = the statistical population (the number of cases), provides that, for 
each increase in M, the shrinkage in R as compared with R increases. Thus, if 


TABLE I 

Correlations Observed and Theoretical (Based upon Observed Means) 


(N = 37 throughout) 


Observed 

Correlation predicted 

Error 

average 

Br.-Sp. 

Wherry 

Br.-Sp. 

Wherry 


(Trait 1) 


i 

5 

.290 

.728 


.618 

-.057 

-.110 

10 

.717 


.726 

.086 

.009 

15 

.754 


.758 

.106 

.004 

20 

.805 

.891 

.825 

.087 

.020 

30 

.936 

.925 

.509 

-.011 

-.427 


(Trait 5) 


1 

5 

.419 

.736 

.783 

.751 

.047 

■ 

10 

.845 

.878 

.834 

.033 


15 

.887 

.915 

.856 

.028 

I 

20 

.877 

.935 

.856 

.058 


30 

.876 

.956 

.745 

.080 



(Trait 10) 


1 

5 

10 

15 

20 

30 


.354 

.479 

.733 

.692 

.254 

.213 

.717 

.846 

.788 

.129 

.071 

.852 

.892 

.816 

.040 

-.036 

.636 

.915 

.822 

.279 

.186 

.805 

.943 

.655 

.138 

-.150 


1 

i 1 

(All Traits) 


1 1 

1 l 


mm 

.320 







.904 

822 

.006 


Wmm 


.933 

.576 

.061 

-.296 


we assume that the M’s in the two formulae arc analogous, i.e., if we assume the 
Wherry-Smith formula to be applicable to the Brown-Spearman formula, we 
see that as M increases the Brown-Spearman formula adds a decreasing incre- 



















SHRINKAGE OF BROWN-SPEARMAN PROPHECY FORMULA 


185 


ment while the Wherry-Smith formula provides that an increasing decrement be 
subtracted, thus eventually we arrive at a point where by further increasing the 
length of the test we will decrease rather than increase the size of the reliability 
coefficient. 

If our hypothesis be true, we must, then, in order to predict the correct value 
of R, substitute the value of equation (1) in equation (2). Doing this we have 

£2 = (N -1 )M 2 r\ i — (M — 1 ) 3 r? t - 2(M - l) 2 r u — (Af — 1) ( ) 

(N -M)[l + 2(M - 1 )r u + (M - 1 )*r*J 

which would then be the form in which the Brown-Spearman formula should be 
used in predicting reliability corrected for chance error by the Wherry-Smith 

TABLE II 


Error in Predicting Reliability (Based upon Observed Means) 


Error 

Brown-Spearman 

Wherry 

over 

.210 

2 

i 

.151- 

.210 


1 

.091- 

.150 

3 


.031- 

.090 

8 

i 

-.029- 

.030 

3 

6 

- .089- - 

.030 

1 

3 

-. 149r- - 

.090 


3 

-.209- - 

.150 



below — 

.209 


2 


TABLE III 


Rietz Criteria of Normality Applied to Results from Means 


Criterion 

Normal 

Brown-Spearman 

Wherry 

Ui 

0 

.074 

-.032 

ft 

0 

.561 

-.283 

ft 

3 

2.008 

3.180 


formula. The same result can of course be secured by applying the formulae 
consecutively. 

In order to test the formula (3), the writer has applied it to some empirical 
data. A recent article by H. H. Remmers of Purdue University furnishes the 
needed data. Remmers study dealt with the increase in reliability due to in¬ 
crease in the number of judgments of certain traits of college professors. 3 His 
results, together with the results of applying formula (3) to the data are shown 
in Table I. 

An inspection of Table I shows at once that while the Brown-Spearman 



186 


ROBERT J. WHERRY 


formula gives results which are consistently too large (15 out of 17 times) the 
Wherry-Smith formula gives results which are more nearly equally distributed 

TABLE IV 

Correlations Observed and Theoretical (Based upon Observed Medians) 

(N = 37 throughout) 


Observed 

Correlation predicted 

Error 

medians 

Br.-Sp. 

Wherry 

Br.-Sp. 

Wherry 


(Trait 1) 


i 

5 

.344 

.752 

.724 

.682 

-.028 

-.070 

10 

.663 

.840 

.779 

.177 

.116 

15 

.702 

.887 

.807 

.185 

.105 

20 

.805 

.913 

.805 

.108 

.000 

30 

.936 

.940 

.635 

.004 

-.301 


(Trait 5) 


1 

.450 





5 

.760 

.804 

.776 

.040 

.016 

10 

.856 

.891 

.852 

.035 

-.004 

15 

.931 

.925 

.873 

-.006 

-.058 

20 

.877 

.942 

.874 

.065 

-.003 

30 

.876 

.961 

.778 

.085 

-.098 


(Trait 10) 


1 

.363 





5 

.433 

.740 

.701 

.307 

.268 

10 

.754 

.851 

.795 

.097 

.041 

15 

.872 

.895 

.822 

.023 

-.050 

20 

.898 

.919 

.820 

.021 

-.078 

30 

.872 

.945 

.669 

.073 

-.203 


(All Traits) 


1 

.503 

.953 ! 


.055 ^ 


20 

.898 


.879 


-.019 

30 

.872 

.968 

.829 

.986 

i 

-.043 


between positive and negative errors (7 to 10), tending to slightly underestimate. 
The actual distribution of errors can be more easily seen by an inspection of 


Table II. 



SHRINKAGE OF BROWN-SPEARMAN PROPHECY FORMULA 


187 


Now, if our formula were perfectly correct, we should expect that the errors 
incurred by its use would be normally distributed about a mean error of zero. 
The Rietz criteria for normality of distribution were applied to these errors with 
results as shown in Table III. 4 It can be readily seen that the Wherry correc¬ 
tion formula gave much better results than did the uncorrected Brown-Spearman 
formula when measured by the Rietz criteria. 

All of the results in the first three tables are based upon the means of the 
results obtained by Remmers, since this was the method used in his paper. 
However, when the number of cases is small, as they were in this study, it is 

TABLE V 


Error in Predicting Reliability (Based upon Observed Medians) 


Error 

Brown-Spearman 

Wherry 

over . 210 

1 

i 

.151- .210 

2 


.091- .150 

3 

2 

.031- .090 

5 

1 

-.029- .030 

6 

5 

-.089- - .030 


5 

-.149- - .090 


1 

-.209- - .150 


1 

below —. 209 

1 

1 


TABLE VI 


Rietz Criteria of Normality Applied to Results from Medians 


Criterion 

Normal 

Brown-Spearman 

Wherry 

Ui 

0 

.074 

1 

© 
oo ! 

ft 

0 

.497 

1 

b 

00 

ft 

3 

1.599 

2.284 


sometimes preferable to use the median rather than the mean as a basis of calcu¬ 
lation, since the median is less affected by extreme cases. The writer has there¬ 
fore recalculated the problem on the basis of the medians discovered by Rem¬ 
mers, and the results are given in Tables IV, V, and VI. The results were found 
to differ but little from those based upon the means of the distributions. 

If we now assume that the formula (3) has been empirically established and 
justified, we must still answer a very practical question, namely, “How long 
shall we make our tests in order to achieve the greatest reliability?” To answer 
this question we must find the point at which R becomes a maximum, with 
respect to changes in M , assuming r n and A r to be constant terms. To find this, 



188 


ROBERT J. WHERRY 


point we must find the derivative of equation (3) with respect to Af and set the 
numerator equal to zero, thus, if we write Formula (3) in a slightly more usable 
form, we have, 

R 2 __ (jy-Wrj, _ M — 1 , . 

(N - M)(l + 2 [M - l]r u + {M - ljV?,) AT — Af ’ 

whence 

dJl = (1 + [M — Hr u ) {4r^M 2 - ( 2Vr 2 t + 3r»[l - r n ))M + (1 - rj 2 } 
dM (N — ilf) 2 (l — 2[M — l]r u + [M — V?r\S 2 ' K 

which causes R to reach a maximum or minimum when the numerator is placed 

TABLE VII 

Showing the value of M which will give a maxi mum value for R 
(According to the Brown-Spearman-Wherry-Smith formula) 


r 11 


10 

.40 

.50 

.60 

.70 

.80 

.90 

3 

4 

4 

4 

5 

5 

5 

8 

9 

9 

10 

10 

10 

10 

8 

14 

14 

14 

15 

15 

15 

8 

19 

19 

19 

20 

20 

20 

3 

24 

24 

24 

25 

25 

25 

8 

29 

29 

29 

30 

30 

30 

3 

34 

34 

34 

35 

.35 

35 

8 

39 

39 

39 

40 

40 

40 

3 

44 

44 

44 

45 

45 

45 

8 

49 

49 

49 

50 

50 

50 


equal to zero. Thus, placing the numerator equal to zero and factoring this 
equation, we find its roots to be 




•(1 - r n ) 


(5a) 


or 


M = 


M = 


2iVr u - 3(1 - r u ) - V4- 12JVr„(l - r„) - 7(1 - r n ) 2 


8 r„ 


2 Nr n - 3(1 - r u ) + VtN*r\ x -12Nr n ( 1- r u ) - 7(1 - r n ) s 


8r„ 


(5b) 


(5c) 


and by substituting actual values of N and rn in the equations, we find that 
equation (5c) is the root we are seeking (i.e.) the value of M for which R be¬ 
comes a maximum. 



SHRINKAGE OP BROWN-SPEARMAN PROPHECY FORMULA 


189 


It can also be readily seen that the value under the radical approximates a 
perfect square (lacking 16 units of being that figure) of the quantity outside of 
the radical, thus approximating this value for large values of N. Thus, when N 
is large (exceeds 100) we may secure satisfactory approximations to M if we 
rewrite equation (5c) in the form below 


M 


_N 

(Approximately) — 


3(1 - r n ) 
4 r u 


(5d) 


Table VII shows the results of equation (5c) for values between N = 10 and 
N = 100 (by increments of 10) for values of r n from .10 to .90 (by increments of 
.10). The use of the formula does not yield integers, and so the results in the 
table are recorded to the nearest whole number rather than exactly as given by 
the formula. 

If, in order to test the validity of formula (5c), we apply it to the values in 
Tables I and IV, we find fairly close agreement. The formula in each case pre¬ 
dicts a maximum value for R when M lies between 15 and 20, and in the actually 
lengthened tests R is found to be a maximum when M is 30,15,15, 20, 30, 15, 20, 
and 20, thus being in agreement six times out of eight. 


Conclusions 

1. The Brown-Spearman formula appears to give results which contain both 
constant and chance errors. 

2. These results can be practically eliminated by applying the Wherry-Smith 
correction formula to the results obtained by the Brown-Spearman formula. 

3. We may find the value of M which will give the greatest value of R by 
substitution in equation (5c) above, and then by substitution of this value in 
equation (3), find the most probable value of R at its maximum point. 

4. For large values of N we may secure satisfactory approximations to M by 
means of the simpler formula (5d). 

BIBLIOGRAPHY 

1. Hull, Clark: “Memorandum Concerning Factors Influencing the Prediction of Per¬ 

formance/ ’ Appendix F, Conference of Individual Psychological Differences, 
National Research Council, Washington, D. C., 1930. 

2. Wherry, Robert J.: “A New Formula for Predicting the Shrinkage of the Multiple 

Correlation Coefficient/’ Annals of Mathematical Statistics, November, 1931. 

3. Remmers, H. H.: “The Equivalence of Judgments to Test Items in the Sense of the 

Spearman-Brown Formula,” Journal of Educational Psychology, January, 1931. 

4. Rietz, H. L.: Mathematical Statistics, Carus Mathematical Monograph, No. 3, Chicago, 

1927, pp. 58-59. 


Cumberland University 
Lebanon, Tennessee 



THE LIKELIHOOD TEST OF INDEPENDENCE IN CONTINGENCY 

TABLES 


By S. S. Wilks 

J. Neyman and E. S. Pearson 1 have applied the principle of the ratio of likeli¬ 
hoods to the problem of determining criteria for testing various hypotheses about 
the group frequencies in problems dealing with grouped data. In particular, 
they have discussed the fundamental x 2 problem, the test of goodness of fit, the 
hypothesis that two samples of grouped data are from the same population, 
and the hypothesis of independence in contingency tables. In their treatment 
of these problems, these authors have started from the limiting form of the 
probability of an observed set of frequencies and have shown that approximately 
each of the appropriate Vs is a function of the minimum value of a corresponding 
xl . The distribution of this minimum value is found, from which the significance 
test is made. 

In certain cases the exact values of the X’s are relatively simple functions of 
the observations which can be as conveniently calculated as the correspond¬ 
ing x 2, s. The purpose of this note is to consider the exact expressions for the X’s 
and find their asymptotic distributions in large samples for the following 
hypotheses: (1) that a sample of grouped data is from a population with 
specified group frequencies (i.e., the fundamental x 2 problem) ,(2) that several 
samples of grouped data are from the same population, and (3) that there is 
independence in a contingency table. 


1. The fundamental x 2 problem. Let pi, p 2 , • * • Pk be the probabilities of the 
mutually exclusive events E u E h •• • Ek respectively. In a sample of N events 
the probability that E\, E 2j • • • E k will occur ni, n 2 , • • • Uk times respectively, 
is given by 


( 1 ) 


C = 


N\ 


nil n 2 \ 


n k \ 


H • 1 % | 


p2*. 


If we let 12 be the class of all sets of values of the p’s such that their sum is 
unity, there is only one set of p’s that maximize C, namely, p, = nj/N (j = 1, 2, 
• • • k ). The maximum of C is 


( 2 ) 


C(SImax) = 


N\ 


n\ x 7U' 1 


< k 


nil nil 


n k l 


N N 


1 Biometrika, vol. 20A (1928), pp. 263-294. 

190 



INDEPENDENCE IN CONTINGENCY TABLES 


191 


The likelihood of the hypothesis that the sample is from a population speci¬ 
fied by p’s having the values pi, p 2 , • • • p* is defined as 


(3) 


c = (Np\ n ' (Np^y /NVk\ nk 

C(i 2 max) \ n x ) \ ti* ) \ n k ) 


X, is a quantity which clearly lies between 0 and 1. It will be 1 only when 
Vi = iy/N ( j = 1, 2, • • • k), (that is, when the hypothesis is rigorously sup¬ 
ported by the sample) and tends to 0 as the sample values rij/N diverge more 
and more from the hypothetical values p,*. The problem of making an exact 
test of significance of an observed value of X, would involve the computation 
of all terms of form (1) the n’ s of which make X, less than the observed value of 
X*. This, of course, is impracticable except perhaps for the binomial case with 
small values of N. However, if the n’s are large we can find an approximate 

solution. If we let Xj = ——then except for terms of order 1 /\^N and 

VN 

higher, the x’s are distributed according to the law 


(4) 


1 


-is- 

o " 


V"(2tt)‘ ‘PiPs ■ • • p k 
where 2,-rr/ = 0. Neglecting terms of order \/\/N and higher we easily find 
(using natural logarithms) 

$ is approximately distributed according to the function 


-2 log X, = X —. Therefore, if 6 = —2 log X,, 

j 


(5) 


(i) 


k -1 k -3 

2 a 2 




which is the x 2 distribution with fc — 1 degrees of freedom. 

Since we have neglected terms of order l/y/N in obtaining (4) there is no 
theoretical reason why x 2 should be used in preference to —2 log X, as the cri¬ 
terion for testing the hypothesis that the sample is from a population specified 
by pi, p 2 , • • • p*. Any practical advantage which —2 log X a may have will 
therefore justify its use. 


2. The hypothesis that several samples of grouped data are from a common 
population. Let p,i, p t * 2 , • • • p»« be the probabilities with which the mutually 
exclusive events E ih E %2 , •••£'»« occur, where 2,p t; = 1 (i = 1,2, • •. r). Then 
in a sample of Ni events the chance that E % 1 , E x2) • • • Eu will occur n th n t2f • • ■ n xa 
times respectively is given by an expression similar to (1). The chance of the 
joint occurrence of the r samples is 


N 1 ! N 2 \ 

nn! ni 2 ! 



V 


n f8 

ra 


( 6 ) 



192 


S. S. WILKS 


We are interested in testing the hypothesis that the r samples are from the 
same population, that is, that the r sets of p’s p t i, p t2 , • • • p X9 (i = 1, 2, • • • r) 
are the same. The likelihood criterion X c appropriate to this hypothesis is the 
ratio of the maximum (w(max)) of (6) subject to the condition that the sets of 
p’s are the same (that is, p*v = p ; - say, i = 1, 2, * • * r; j = 1, 2, • • • s) to the max¬ 
imum (ft (max)) of (6) without this restriction. 

For convenience let the observations be arranged in table form so that is 
the frequency in the i-th row and j-th column. Let n x . and n.,- be the totals of 
the i-th row and j- th column respectively, and N the total of all observations. 
Thus n x . is the same as N x . The expression for X c will be 


(7) 


X c = 


n •»n 


n 


V'V 


n T - 


N N n[ i“ n^ 1 * • • • n r V* 


It can be shown analytically that X c lies between 0 and 1. It can be 1 only 


Tli * 71/ft Tt 

when = ... as j = 1, 2, ••• s, that is, when the hypothesis of a 

iVl iV 2 iV r 

common population is perfectly substantiated by the samples. Because of the 
fact that the n ti arc integers, it is clear that X r can be 1 only in exceptional 
cases, but it can take on values arbitrarily near 1 for sufficiently large values of 
the n X j. 

^ . _ y 

If the Ni are large, the quantities x tJ = ——- are approximately dis- 

VN X 


tributed according to the function 


( 8 ) 


F = 


( _L. 

\(2ir)* "* Pi Vi 




1 


where S,- x l7 - = 0, i = 1, 2, • • • r. By neglecting terms of order 1/y/N and 
higher, we find that 


(9) 


2 log X c = J) 

it] “ 1 


Denoting the quantity on the right side of (9) by xo it follows by straightforward 
analysis that the characteristic function <p(t) of xo defined by the r(s — l)-tuple 

integral / • • • I e ltx °F dx n • • • dx r a has the value 

J —00 oo 

(r -!)(*-!) ( r —1 )( ^1J 

do) (j) 2 a - u) 2 . 


But it is well known that (10) is the characteristic function of any quantity dis¬ 
tributed according to (5) with (k — 1) replaced by (r — l)(s — 1). This, of 
course, is the x 2 distribution with (r — l)(s — 1) degrees of freedom. 

It will be noticed that the exact value of X c is a function of the observations 
n t j which is independent of the p’s, while the approximate value of —2 log X 0 



INDEPENDENCE IN CONTINGENCY TABLES 


193 


as given by (9) involves the p's. Before (9) could be used practically, one would 
have to replace the p's by sample estimates, thus making further approximations 
necessary in order to get the distribution. If the usual estimates p ; = n.JN 
are used for the p's in xo we find that xo reduces to 

r h: n i'Y 

N ) 

• U 1L 
N 

which is the familiar x 2 function for testing independence in contingency tables. 
However, (11) differs from xo by terms of the same order (i.e., 1 /\/N t ) as those 
by which xo differs from — 2 log X c . Since we have neglected terms of the same 
order in obtaining (8), there is no theoretical reason why (11) should be used 
rather than —2 log \ r for testing the hypothesis that the m samples are from a 
common population. 


(ID 


J) 




n x 


3. The hypothesis of independence in contingency tables. We shall con¬ 
sider a sample of N observations which can be arranged in a two-way contin¬ 
gency table having r rows and # columns. Let p tJ * be the probability that an 
observation will fall in the i-th row' and j-th column. The probability that the 
sample of N items will be distributed so that n X] will be the number falling in 
the z-th row’ and j-th column (i = 1, 2, • • • r; j = 1, 2, • • • s) is given by 


( 12 ) 


N\ 


nn! n i2 ! • • • n r J 


P11P12 


Vr 7 


Here we arc interested in testing the hypothesis that the classification by rows 
is independent of the classification by cotumns, that is, that p X] is of the form 
p t (/j where 

(13) 2 t p t = 1 , 2 jqj - 1 . 

For this hypothesis the appropriate likelihood criterion, say , is the ratio of 

the maximum (w(max)) of (12) when p t / = p t qj restricted by the conditions 

(13) to the maximum (12 (max)) of (12) subject only to the condition that 

VI p tJ = 1. x: turns out to be identical with X c in (7). When the hypothesis 

t,; / 
of independence is true, the approximate distribution of the quantity —2 log X c 

is the same as that of — 2 log X r when the hypothesis of a common population 

is true. To show 7 that the distributions are the same we note that by placing 


(14) 


„ __ n xl - Np x qj 

' ~'Vn~ 


we find from (12) that the x xJ are approximately distributed according to the 
function 



194 


S. S. WILKS 


1 V * J 

2 .. p q 
t.J V ' q > 


( 2 tt ) 2 (p Y p 2 ••• Pr) 2 (qiq* • * • q.) 2 
where X Xi ; = 0. To the same degree of approximation we find 


(16) - 2io g \l = y) X i L s: Ji X”- x '¥ = 

T7 M/ 

Now the characteristic function of xo 2 can be shown without much difficulty to 
be identical with that of xo as given by (10). The identity of the characteristic 
functions of xo 2 and xo implies the identity of the asymptotic distributions of 
— 2 log X« and — 2 log X<?. The problem of testing the hypothesis of a common 
population in several samples of grouped data is mathematically equivalent to 
that of testing the hypothesis of independence in contingency tables. 

If the usual estimates p t = q, = arc used in (16) we find that xo 

becomes the expression given by (11). But (11) differs from xo 2 by terms of 
order 1 /\/N and higher. Therefore, — 2 log x' and (11) can differ from each 
other only by terms of order l/\/ N which is the order of approximation involved 
in getting (15) from (12). Thus, — 2 log x' has as much validity as the usual 
criterion (11) for testing for independence in contingency tables. 

The X* method can easily be extended to the case of contingency tables of 
higher order. For example, in a three-way table of r rows, s columns and t 
layers in which n#* is the number of items observed in the t-th row,./-th column 
and A;-th layer, the X' criterion for testing the hypothesis of independence, that 
is, that the probabilities p tjk are of the form pupypzk is such that 

-2 log xl = 2 X ( n Hk log n %jk ) + 4 N log N — 2 X (n».. log n*..) 

(17) 

~ 2 X; (n.f. log n.j.) - 2 X* fa--* log n.. k ) 

where n».. = X and so on. —2 log X' in this case is approximately dis- 
3 k 

tributed like x 2 with rst — r — s — t + 2 degrees of freedom. 

4. Illustrative examples. To illustrate the use of X, we shall consider the 
following example given by R. A. Fisher 2 dealing with de Winton and Bateson’s 
data on results of interbreeding the hybrid (Fi) generation of Primula in which 
two factors are considered. 



Flat Leaves 

Crimped Leaves 



Normal 

Eye 

Primrose 

Queen 

Eye 

Lee’s 

Eye 

Primrose 

Queen 

Eye 

Total 

Observed (n,). 

328 

122 

77 

33 

560 

Expected (Npi) . 

315 

! 

105 

105 

35 

560 


* Statistical Methods for Research Workers, 4th ed. p. 84. 





INDEPENDENCE IN CONTINGENCY TABLES 


195 


If the two factors are Mendelian, that is, segregate independently, the four 
classes of offspring resulting from interbreeding the Fi generation are expected 
to appear in the ratio 9:3:3:1 (assuming all classes equally viable). We wish to 
test the hypothesis of a 9:3:3:1 ratio. It is found that 

t )J = 11.50 , 

Entering Fisher’s x 2 table for n = 3, we find that the chance of exceeding the 
value 11.50 is less than .01, which is significant if we take P = .05 as the critical 
level of significant deviation. Thus, the observed frequencies cannot be reason¬ 
ably explained as chance deviations from the 9:3:3:1 ratio. 

The usual x 2 method gives x 2 == 10.87 and n = 3 for the 9:3:3:1 hypothesis. 
The value of P in this case lies between .01 and .02. It follows from the theo¬ 
retical discussion that 10.87 has no greater validity than 11.50 in testing this 
hypothesis. 

We shall illustrate the use of \ e by using another example given by Fisher 
dealing with Waehter’s data for back-crosses in mice. 


— 2 log* X* = 2 log* 10 £52 n t log l0 n x — 52 logi 0 (Np 



Black 

Self 

Black 

Piebald 

Brown 

Self 

Brown 

Piebald 

Total 

Coupling: 

Fi Males. 

88 

82 

75 

60 

305 

Fi Females. 

38 

34 

30 

21 

123 

Repulsion: 

Fi Males. 

115 

93 

80 

130 

418 

Fi Females. 

96 

88 

95 

79 

358 

Total. 

337 

297 

280 

290 

1204 


The back-crosses were made according as the male or female parents of the 
Fi generation were heterozygous in the two factors Black-Brown, Self-Piebald, 
and according to whether the tw T o dominant genes came both from one parent 
(Coupling) or one from each parent (Repulsion). We wish to test the hypoth¬ 
esis that the proportions are independent of the matings used. We find 

-2 log X c = 2 log* 10 [52 n ti logio n xi 


+ N logio N - 52* ft,. l°gio Wi- ~ 22 > u j lo gio n j J = 21.69 

Entering Fisher’s x 2 table for n = 9 w r e find that the chance of exceeding this 
value is less than .01. The departure from the hypothesis of independence is 
significant on basis of the P = .05 level. The x 2 method gives the remarkably 
close result x 2 = 21.83, which, with n = 9 gives P < .01. 

5. Summary. We have considered the exact expressions for the Neyman- 
Pearson X criteria appropriate to the following hypotheses: (1) That a sample 








196 


S. S. WILKS 


of grouped data is from a population with specified group proportions (the 
fundamental x 2 problem), (2) that several samples of grouped data are from a 
common population, (3) that there is independence in a contingency table. The 
quantity — 2 log X for each of these cases is approximately distributed like x 2 , 
the number of degrees of freedom being given in each case. It is shown that the 
usual x 2 method of testing these hypotheses has no greater theoretical validity 
than the X method. On the practical side, it is to be remarked that —2 log X 
can be computed with fewer operations than x 2 * Two examples arc given to 
illustrate the practical application of the X method. 

Princeton* University. 



THE PROBABILITY THAT THE MEAN OF A SECOND SAMPLE WILL 
DIFFER FROM THE MEAN OF A FIRST SAMPLE BY LESS THAN 
A CERTAIN MULTIPLE OF THE STANDARD DEVIATION OF 
THE FIRST SAMPLE 

By G. A. Baker, Ph.D. 

The following statement of the significance of a probable error is often made: 
“The probable error of the mean is a value above and below the mean such that 
if the test were repeated under the same conditions there would be, on the 
average, equal chances that ihe mean would fall within or without this range.” 
The probable error is attached to the mean of the sample and it is assumed that 
the standard deviation of the sample is that of the sampled normal population. 
This was formerly a very usual explanation of the meaning of probable error by 
research workers, but it is inaccurate and misleading, especially for samples of 
20 or less such as are dealt with in agricultural experiments. The inaccuracy of 
this explanation of the meaning of probable error has been realized for many 
years by competent statisticians, but no satisfactory treatment has heretofore 
been devised. 1 

The attempted explanation of the probable error in terms of the expected 
frequency of the occurrence of different size deviations of the means of future 
samples from the sample mean does raise a very interesting, important, and 
legitimate question, namely, what is the probability of a second mean lying within 
a certain multiple of the standard deviation of a first sample of the mean of a 
first sample? This question is of fundamental concern to those engaged in 
experimental work. Its answer will indicate to investigators reasonable devia¬ 
tions from the results of their first experiments, will form a valid basis for the 
rejection of doubtful observations or groups of such observations, and will form 
a basis for a test of the significance of the divergence of results in different 
experiments. It is found that the usual method of treating the probable error 
gives an overly optimistic idea of the smallness of the deviations that may be 
expected in future samples. 

The distribution function of the variable 

v = — 

y 

where x is the mean of the first sample, z is the mean of the second sample, 
and y is the standard deviation of the first sample, is obtained in this paper. 
The sampled population is assumed to be normal. 

1 Camp, Burton H. “Suggested Problems for Mathematical Research,” Journal Amer¬ 
ican Statistical Association , Supplement Vol. 30, No. 189A, Mar. 1935, p. 259, No. 5. 

197 



198 


G. A. BAKER 


Let the sampled population be represented by 

(1) f(x) = — 7 =^ — oo <£ x g oo. 

V 2tt 

If a sample of n is drawn from (1) the means, as is well known, will be distributed 
as proportional to 

(2) e-* nxi , -oogxgoo, 

and the standard deviation will be distributed as proportional to 

(3) y n ~ 2 e~~* nvi f O^ygoo. 

If a second sample of n is drawn from (1) its mean will be distributed as propor¬ 
tional to 

(4) , -00 ^ ^ 00 , 

Consider the expression 


y 

and call it v. Then v is the difference between the means of the two samples 
measured in terms of the standard deviation of the first sample. The distribu¬ 
tion function of v is sought. 

The three variables x , y , and z are independent. Let y, for the moment, have 
a constant value and write 


( 6 ) 


vy = x — z. 


The probability of a given value of vy in d(vy) for a given value of y is now being 
sought, that is, vy is regarded as constant. This probability is proportional to 


(7) 





6 -(*+*nr>* dz 



from the application of the following 

Lemma . If x and y are independent variables, — oo cc ; — oo oo, 

and the probability of an x in dx is f(x)dx and the probability of a y in dy is 
<p(y)dy, then the probability of v = y — x in dv is proportional to 2 


j f(x)<p(v + x) dx dv. 


Thus the probability of a value of v in dv for a given y is proportional to 
(8) ^*-i e ~ 2 n [ 1+ 2 '] v * dv 


* Baker, G. A. “Random Sampling from Non-Homogeneous Populations , 1 * Metron , 
Vol. 8, No. 3, Feb. 1930, p. 68 (slightly modified). 



DIFFERENCE BETWEEN TWO MEANS 


199 


since d(vy) = ydv for y constant. Hence the total probability of a particular 
value of v in dv will be given as proportional to 


(9) [f” dyjdv 

which is proportional to 


( 10 ) 


dv 



n 

2 


If the number in the first sample is n x and the number in the second sample is 
n 2 , then (10) becomes 


(ID 


dv 



tt2 

n i + 7*2 



This distribution, (11), permits an answer to be given to the question, what is 
the probability that the mean of a sample of a given size n 2 will differ from the 
mean of a first sample of size n\ by as much as a constant multiple of the standard 
deviation of the first sample? Thus, this distribution gives a clear and compre¬ 
hensible indication of the expected conformity of future experiments and gives 
a v aluable test for the significance of the difference between two means. If it is 
desired to use this distribution as a rejection criterion, ni should be taken so as 
to include as many items as possible and so as to exclude the doubtful ones. 
The doubtful items should be included in the second sample. If the original 
sample is broken up into two or more samples it must be done in such a way as 
not to destroy the randomness of the resulting parts. 

Example . Suppose for the purpose of illustration that a sample of four is to 
be considered. The proper ^distribution is 


- y/2, dv 



The value of v which is necessary to give a probability of one-half is a root of 


tan” 1 


P . 1 a/ 2 p 
y/2 2 p 2 + 2 


7r 

4 


which is .9. That is, an interval of 1.8 times the standard deviation of the 
sample of four with center at the mean of the sample is necessary for a proba¬ 
bility of one-half that the mean of the next sample of four will lie in this interval. 
This compares with .75 times the standard deviation of the sample if 


(7 

\/ft — 1 



200 


G. A. BAKER 


is used as the probable error of the mean and with .65 times the standard devia¬ 
tion of the sample if 

<T 

yj n 


is used as the probable error of the mean. The last two methods of calculating a 
probable error with the interpretation indicated at the beginning of this paper 
give the investigator an unwarranted feeling of assurance about the agreement 
of future samples with a first sample. 

If two samples of ru and n 2 are drawn from the normal population, (1), then 
these samples can be combined for the purpose of calculating a standard devia¬ 
tion and the difference between the means of the samples can be measured in 
terms of the standard deviation of the combined sample. The distribution 
function of the difference of the means divided by the standard deviation of the 
combined sample is 


(110 


dv 

l ,_ni_n2_ 

(ft-i + w 2 ) 2 



This distribution, (110, is the basis for a valid test for the significance of the 
difference between two means. If either this test or the test based on distribu¬ 
tion (11) shows a significant difference between the means it can not be ignored. 
“Student's” ^-distribution is proportional to 


( 12 ) 


dt 


( 



N 

2 


The above distributions can be easily transformed into ^-distributions so that 
“Student's” tables can be used. For instance, if we put 


v 


V 21 

Vn — i * 


N = n, 


then (10) becomes proportional to (12). Again, put 

v _ \/ ni + n 2 1 
y/ 712 y/ ni — 1 

and (11) becomes proportional to (12). Finally, put 

__fa + n 2 ) t _ 

yj 111 712 y/ni + 712 — 1 
and (11') becomes proportional to (12). 


N = Til, 


N = 7h + n 2 , 


Summary. The distributions found for the difference of the means of two 
samples in terms of a standard deviation of one sample or combination of both 



DIFFERENCE BETWEEN TWO MEANS 


201 


samples are similar to and easily transformed into “Student’s” ^-distribution so 
that his tables can be used. However, these distributions answer a practical, 
interesting, and important question that “Student’s” ^-distribution does not. 
If in an experimental science a series of observations is made it is desirable to 
know how much a similar series of observations could be expected to differ from 
the set of observations now available. This deviation, if it is to mean anything, 
must be expressed in terms of quantities available from the observations already 
made. This paper gives the probability function of a deviation in the mean of 
a future sample measured from the mean of a first sample and measured in terms 
of the standard deviation of a first sample, that is, in terms of quantities known 
from the first sample. It is a very definite advantage and a great gain in assur¬ 
ance to know the point from which measurements are being made and the unit 
in which they are expressed instead of making vague, ill-defined assumptions 
about the zero point and unit length of the measuring scale. It is true that 
differences that were formerly considered significant may not be so considered 
now. But these differences would appear insignificant if experiments were 
sufficiently repeated, so that the net result is fewer inconsistencies to explain 
away. 



ON SAMPLES FROM A MULTIVARIATE NORMAL POPULATION 1 

By Solomon Kullback 

1. Introduction. In this paper we shall discuss the distribution of certain 
functions calculated for samples drawn from a multivariate normal population. 
The method of solution is based on the theory of characteristic functions and 
presents further application of that theory to the distribution problem of 
statistics. 2 

We shall have occasion to refer to the multivariate normal population whose 
distribution law is given by 

(1.1) F(x) SS 7T-*' 2 I B vq I 1 / 2 *-»> (p, q = 1, 2, ... , to) 

where B(x — m, x — m) is the real, positive definite quadratic form of the 
x p — m P with matrix || B pq ||. Here m p is the mean in the population of the pth 
variate and B pq = A pq /2o p o q A where o p is the standard deviation in the popu¬ 
lation of the pth variate; A is the determinant of population correlations p pq = p qp ; 
A pq is the co-factor of p pq in A; and | B pq | is the determinant of the matrix || B vq ||. 

Since the integral of (1.1) over the entire field of variation of the variables is 
unity, we have (using abbreviated notation) 

(1.2) J dx = 7r" /2 

Equation (1.2) will be true if || B pq || is complex, provided its real part is sym¬ 
metric and positive definite. 3 

The distribution of sample means of samples from the population (1.1) is 
independent of the distribution of the system of sample variances and covariances 
and is given by 4 

(1.3) Ei(x-) ss TT- nl2 I A Pq | 1/2 *- w) 

where A (x — m, x — m) is the real, positive definite quadratic form of the x p — m p 

N 

with matrix || A p<z ||. Here Jc p = (1/^) ^2 x pa is the sample mean of the pth 


1 Presented to the American Mathematical Society, February 23, 1935. 

2 For more complete reference to the theory of characteristic functions as applied to 
statistics see S. Kullback, Annals of Mathematical Statistics , Vol. 5 (1934), pp. 263-307. 

3 J. Wislnirt and M. S. Bartlett, Proc. Cambridge Phil. Soc. f Vol. 29 (1933), pp. 260 ff. 

4 J. Wishart, Biometrika } Vol. 20 A (1928), pp. 32-52. 

J. Wishart and M. S. Bartlett, loc. cit. 


202 



SAMPLES FROM MULTIVARIATE NORMAL POPULATION 


203 


variate, and A pq = NB pq , where B pq has been defined for equation (1.1). The 
distribution law of the system of sample variances and covariances is given by* 


(1.4) 


F,(a) 


n l '( N - r )/2 


Q—A (tt) 


*PQ I 


where A{d) = Tl A pq d pq and d pq — d qp = (\/N) (%i>a — Xp) (Xq<* — x q ) 

v,q a 1 a“ 1 

with A pq and x p defined as for (1.3). Since the integral of (1.4) over the entire 
field of variation of the d pq is unity, we have 6 

(1.5) / <T^ o) | d pq |(*V—n—2) /2 da = .nU-DM | |(l-A0/2 T(iV - r)/2 

y r-1 

Equation (1.5) will also hold if the matrix || A pq || is complex, provided its real 
part is symmetric and positive definite. 7 

2. Variance. Consider a sample of N independent items from the normal 
population (1.1). Let 

( 2 . 1 ) v = £ a„ 


where d pq is defined as in (1.4). From the theory of characteristic functions 
and (1.5), we have that the characteristic function of the distribution law of v 
is given by 8 

(2.2) r(t) = f F,(a) da — \ A „ t |<™. | A Pq - it | ( ™. 

Tt may be readily shown that 

n 

(2.3) | A„ q - it | = | A pq | - it £ Avq 

P,q—l 

where A vq is the co-factor of A pq in | A pq |. 

We thus have for the distribution law 8 of v 

(2.4) P( v) = f e-'<"(A/c - it) a ~ N>li dt 


5 J. Wish art, loc. cil. 

6 Cf. S. S. Wilks, Biomelrika, Vol. 24 (1932), pp. 471-494. 

7 A. E. Ingham, Proa. Cambridge Phil. Soc ., Vol. 29 (1933), p. 271 ff. The considerations 
in this paper will still hold if the condition above is imposed. 

8 S. Kul 1 back, loc. cit., p. 272. 



204 


SOLOMON KULLBACK 


where A = | A pg \, c = 
By using the fact that 9 


£ 


A vq and A/c > 0 since \\ A pq || is positive definite. 


(2.5) 

where k > 0, a 



e» z z~ k dz 


> 0, we have 


k 


n > 0 
^0 


( 2 . 6 ) 


P(v) 


(A/cY N ~ l)li 
T (N— l)/2 


v (AT-3)/2 


g—(A/c)t> 


3. Ratio of variances. If Vi and Vz represent the statistic v (defined in 

(2.1)), obtained from independent samples of Ni and Ni items respectively, then 
it may be shown that the distribution law of w = i>i/v 2 is given by 10 


(3.1) P{w) = 


r(AT, + N,~ 2)/2 
r (Nr - i)/2 r(N 2 - i)/2 


w 




If we set w = e 2 ‘ ni/n 2 , where n\ = Ni — 1 and n 2 = N 2 — 1 we obtain for the 
distribution law of z" 


(3.2) 


P(z) = 2 


r (rai -f nz)/ 2 

rrai/2 rws/2 


n? /2 n?* ,2 e Bl, (n 2 + n 1 c 22 ) _( ’ , ‘ + '** )/2 . 


4. Student’s distribution. Consider a sample of iV independent items from 
the normal population (1.1). Let 

n 

(4.1) m = £ (•£;> — »i,>) (x, - m,) 

p,q—l 

where x p and m p are defined as in (1.3). The characteristic function of the 
simultaneous distribution function of //, defined as in (4.1) and v defined as in 

(2.1) is given by 

, x = / exp < it\ £ ( £ p - m p )(x t - m q ) + it 2 X) <W 

(4.2) J l p.«-i p.9=i J 

Fi(x)F 2 (a) dxda 


9 Cf. A. E. Ingham, loc. cit. 

J. Wishart and M. S. Bartlett, Proc. Cambridge Phil. Soc ., Vol. 28 (1932), p. 455 ff. 

10 S. Kullback, note accepted for publication soon in the Annals of Math. Statistics. 

11 Cf. R. A. Fisher, I. Proc. International Math. Congress, Toronto (1924), Vol. 2, pp. 805- 
813. 

R. A. Fisher, II. Statistical Methods for Research Workers , 4th Edition (1932), Edinburgh: 
Oliver and Boyd, pp. 224-227. 



SAMPLES FROM MULTIVARIATE NORMAL POPULATION 


205 


where F\ and F 2 are defined as in (1.3) and (1.4) respectively. From (1.2) and 
(1.5) we have that 


(4.3) <p(t h h) = (A/cy/^A/c - iti)~ m (A/c - if 2 )<>- «« 

where A and c are defined as in (2.4). The simultaneous distribution of /i and 
v is given by 

(4.4) P(m, v) = (l/2ir) 2 f°° ^ e **-*'*>«,, /») M 2 
which evaluated by a procedure similar to that used for (2.4) yields 


(4.5) 


P(n,v) = 


( A/c ) Nl2 

r(iV- i)/2n/2 


^—1/2 gr~nA [e V (U~S) /2 g-tM /c < 


From (4.5) we may readily obtain the distribution of z = /z l/2 /r 1/2 to be 12 

riV72 

(4.6) D(z) — 2 _ l)/2"n/2 ^ 4” z 2 ) > (0 ^ 2 ^ °c) . 


5. /c samples. Suppose we have k independent samples of N ly iV 2 , • • • , AT* 
items respectively, drawn fron the normal population defined by (1.1). Let 
/i r , (r = 1, 2, • • • , k) be the statistic /z, defined by (4.1), for each of the k sam¬ 
ples respectively; let V ry (r = 1, 2, • • • , k) be the statistic 7, defined by (2.1), 
for each of the k samples respectively; let /z 0 and Vo be the values of these sta¬ 
tistics for the sample of iV = Ni + N 2 + ••• + Nk items obtained by pooling 
all the samples. 

It may be readily verified that 

(5.1) mo = Z Mr Nl/m + 2 Z (« ^ 0) 

r=l a,0=* 1 

(5.2) Nmo +NV 0 = Z (jVrMr + NrVr) 

r- 1 

(5.3) i\TFo = Z WrVr + Mrlir) - 2 Z Ha'*nTN a Nf,/N (« ^ 0) 

r=l a» 0 =* 1 

where M, = (NN r - Nl)/N. 

In view of (2.6) and (4.5), it is evident that the simultaneous distribution 
law of Mr, Vr, (r = 1, 2, • • • , k) is given by 

(5.4) P(m) • Q(t>) ^ ri P(Mr; Nr) Q(Vr; Nr) 


12 Cf. “Student,” Biometrika , Vol. 6 (1908-09), pp. 1-25. 

R. A. Fisher, Metron , Vol. 5 (1925), pp. 90-104. 

P. R. Rider, Annals of Mathematics , 2nd S., Vol. 31 (1930), pp. 579-582. 



206 


SOLOMON KULLBACK 


where 

(5.5) P( Mr ; Nr) m (B/D) m 

1)/2 

(5.6) Q(Vr) Nr) m »/* n Vr ~ 31/2 e-^rBio 


and B is the determinant | B pq | defined in (1.1) and D = Z where is 

p, g—1 

the co-factor of JB pg in | B pq |. 

Using (5.3) and (5.4), we find that the characteristic function of the simul¬ 
taneous distribution law of <p r = V, B/D, (r = 0, 1, • ■ • , k) is given by 

(5.7) <p(t 0 , h) = J efAM+vo .., h ) p( M ). Q(„) iu<fo 

where 

U«o) = (B Uo/D) ( £ urMr/N -2 £ , (« * 0) 

U-l a, 0-1 J 

and 

V(k, fc) = (B/D) | Z Vrdtr + ito N r /N)j 

Let urB/D = f* and V r B/D = ?y r , (r = 1, 2, • •• , fc) and rewrite (5.7) as 
the product of & -f 1 integrals 


(5.8) <p(to f t\ y - • • , J*) = /o/l - Ik 

where 


« (W 2 • • • iV ,) 1/2 
r(i/2)* 


e-w.n 




SAMPLES FROM MULTIVARIATE NORMAL POPULATION 


207 


By employing (1.2) we find that 


(5.11) h = (AW •.. A T k »i* 


Ni — itoMi/N tfoAW/AP ... itoNtNk/N* -*'* 
it o AW/AT 2 - 7<o M 2 /1V • • • it, NiNk/N' 1 


ito Ni.Ni/N 2 it, NkNi/N 2 ■■■ N k - it, M k /N 

The determinant may be readily evaluated by removing the common factor N r 
from the rth row (remembering the value of M r as given in (5.3)) and applying 
the operations 13 (row 1 — row 2), (row 2 — row 3), • • • , and then column k -f- 
column 1 + column 2 -f • • • + column k — 1. Wo thus obtain 

(5.12) 7(i = (1 — ik/N)-^' 2 
The integral in (5.10) is well-known and yields 

(5.13) Ir = N[ Nr ~-' y ' 2 (Nr - iloNr/N - itrY 
There thus results 


(5.14) *(/ 0 , h) = «(1 - it„/N)-«-™ II (N a - itoNJN - it a )-*°-m 


where G = II . 

a = 1 

The simultaneous distribution law of <p r , (r = 0, 1, • • • , k) is given by 

0 

P(<fio, <Ph ■■■ , = (2 W y+1 

(5.15) e~'‘o *•-**> ~ ,lk * k dto dti • • • dt k 

I (1 - it,/NY II (Na - it „ Na/N - it „)wr«-»rt 

•/-«> «-i 

Integrating successively with respect to 4, t k -\, ■■■ ,h and applying (2.5) we have 


T’W Vo • • • i <Pk) = G exp 


(5.16) 


a — 1 


<P Y°~ Z)n i 
V(N7-W/2 2t 


. ( Hi Nk \ 

dta 

(1 - it 0 /Ny k ~ m 


18 Cf. A. C. Aitken, Quarterly Journal Math., Vol. 2 (1931), pp. 130—136. 




208 


SOLOMON K.ULLBACK 


and finally 

PU ft, • • • , ft) = GN«- l)l2 e-"" 

( 5 -!7) (ft - Ni<pi/N - N kPk /N) (k -* m yj 

r(fc -1)/2 11 r(j\r„ -1/2) • 

ct » 1 

If we apply to (5.17) the transformation 

I (po — ^0 

fr Vo/Nr (v = 1; 2, ■ • • , &) 

and integrate out <p 0 , we obtain for the simultaneous distribution law of f r = 

Nrvr/Nw = N r v r /NV% 


f«, • • •,f*) 


(5.19) 


r(AT -1)/2 
r(fc - i )/2 u 


fi — fj — • 


- ft)™ 


* v( *„-3>/2 

I! r(w' -1)/2 


where the limits of variation in (5.19) are 14 


(5.20) 


Jo ^ fi :£ 1 

jo g fr g 1 - fl - r* -fr—1 , 


(r = 2, 3, • • • , k) 


6 . Correlation ratio. Let f = log (1 — fi — f* — • • • — f*) where the 
f r , (r = 1, 2, • • • , fc) are defined and distributed as in (5.19). The character¬ 
istic function of the distribution law of f is given by 


( 6 . 1 ) 


<p(t) = 


r(iv - 1)/2 f (l 
r (k - i)/2 J K 


fi- 


f* - • • • - f*) 


(&+2»f—3)/2 


n 

o«l 


v(Ar*-3)/2 


where the limits of variation are given by (5.20). The integral in (6.1) is readily 
evaluated as a Dirichlet integral, 15 and we obtain 


( 6 . 2 ) 


- r(iV- l)/2 r(fc — 1 -f 2 it)/2 
J T(k ~ l)/2 T(N - 1 + 2it)/2 * 


14 Cf. J. Neyman and E. S. Pearson, I. Bulletin de VAcadtmie Polonaise des Sciences et 
des Lettres , Skrie A , Sciences Mathematiques y 1931, pp. 460-481. 

15 E. Goursat-E. R. Hedrick, Mathematical Analysis , Vol. I (1904) (Ginn and Co., N. Y.), 
p. 308. 




SAMPLES FROM MULTIVARIATE NORMAL POPULATION 


209 


The distribution law of f is given by 


(6.3) 


P(t) 


W-D/2 1 r { Y( k - 1 + 2i t)/2 

r(* - l )/2 2 tt r(iv - 1 + 2 it )/2 


Now it may be shown that 1 * 


(6.4) 


1 r c -ut r(jfc - 1 + 2 it)/2 , _ «rtt-u/*(i _ e r)(Ar-t -»/2 
2tt /.« T(iV - 1 + 2it)/2 Y(N - k)/2 


so that 


(6.5) 


p(r) 


r(iv-i )/2 

r(* - l)/2 Y(N - k)/2 


e rc*—D /2 (i _ e iyif-k- 2 )n t 


If we set e f = y 2 , then we obtain for the distribution 17 of y 2 


(G.6) D(v 2 ) = - - -__ - (v 2 ) (1 _ „ 2 \(W-*- 2)/2 

V ' KV> r(ifc - l)/2 Y{N - k)/2 KV ’ U ’ ’ 

From its definition we have that 


(0.7) r = (NV o - NM - NkVk)/NV 0 

which reduces to 


( 0 . 8 ) t = (N\W i + N 2 W 2 + • • • + N k W k )/NV 0 


where W a = ( x p« ~ ^po)(x qa — x^o) with x pa the sample mean of the pth 

variate in the ath sample and Xpo the sample mean of the pth variate in the 
sample formed by pooling all the samples . 18 

In a similar manner, we have that the distribution law of 17 * 

(a = 1 , 2 , • • • , ft) is given by 


(0.9) 


D (vl) - jv (A r a _ 


Y(N - 1/2) 


(N a — l)/2 Y(N — N a /2) 


K (vl) 


k V„-3)/2 


(1 


It may be of interest to point out another derivation for the distribution of 
W = 1 - y 2 . Let 

[0 = (B/D)(NiVi + iV s F 2 + • • • + N k V k ) 

(6-10) 

\e B = C B/D)NV 0 


n Whittaker and Watson, Modern Analysis, 2nd Ed., pp. 283, 333. 

17 Cf. R. A. Fisher, loc. cit., I. 

H. Hotelling, Proc. National Academy of Sciences, Vol. XI (1925), pp. 657-662. 

18 Cf. S. S. Wilks, loc. cit., p. 482. 




210 


SOLOMON KULLBACK 


The characteristic function of the simultaneous distribution law of 6 and do is 
immediately derivable from (5.14) by replacing U by Nk and t r by N r t 
(r = 1, 2, • • • , k). There results 

(6.11) <p(t, to) = (1 - f<,)-«- 1)/2 ( 1 - it 0 - it)-*-™. 


By a procedure similar to that already used wc find that the simultaneous distri¬ 
bution law of 6 and 9o is given by 


( 6 . 12 ) 


P(e, $o) 


0(A-*-2)/2(0 o - oy^-w-e-*" 
Y(N -A )]2 r(k - l)/2 • 


By applying to (6.12) the transformation 0 = B a h-, 6 n - 6„ and integrating out 
the value of 6 0 , we find for the distribution law of hr 


( 6 . 13 ) nm = 

From (6.12) and (6.10) it may be shown that the following estimates of variance 
all have the same expected value 19 


(6.14) 


N 1 V l + N,V,+ ••• +mv, 
N — k 

NV „ 

N - 1 

NiW l ± NtW t _+ ■ • • +Nttt\ 
k -T 


7. Distribution of variances. Let 


(7.1) 


'dr = NrVrB/D (f = 1, 2, • • • , Jfc) 

<60 = NVoB/D 

6 = (B/D) (N 1 V l + N 2 V 2 + • • • + NkVt) 


where the right members of (7.1) are defined as in section 5. It is evident that 
the characteristic function of the simultaneous distribution law of 9, do, 0,, 
(r = 1, 2, ■ ■ ■ , k — 1) is derivable from (5.14) by replacing to by Nt n , t r by 
N r (t r + t), (r = 1, 2, • • • , k — 1) and by Nrf.. Thus 


v(Jt,U,t h • •• ,f*_i) = (l — 

(7.2) *-i 

(1 - it 0 - tf) _( - Vi_,,/ * II (1 - ffo - - t<) a_Jtf, “ )/a • 


19 Cf. J. Neyman and E. S. Pearson, II. Biomelrika , Vol. 20A (1928), pp. 273-274. 
S. Kullback, Annals of Mathematical Statistics, Vol. 6 (1935), pp. 76-77. 




SAMPLES FROM MULTIVARIATE NORMAL POPULATION 


211 


By proceeding as in section 5 we arrive at the result that the simultaneous 
distribution law of 0, 0 O , 0 r , (r = 1,2, • • • , k - 1) is given by 


(7.3) 


p(e, 0 0 ,0 r ) = dl z d *~:: • zz e ±^ 

r(* - i)/2 r(iv t - i)/2 


(.Vjfc—3) /2 


Q{N a -Z)/2 


TT . _ 

r(AT a -l )/2 

where 0 O ^ 0 , 0 g 0 L -f 0 2 -f ... 4 - 0 t 

By integrating out the variable 0 „ from (7.3) we have for the simultaneous 
distribution law of 0 , 0 ,, (r = 1 , 2 , • • • , k — 1 ) 

(7 4) mo 0 i «“* {d - 61 - - 3)12 TT 0 «'“' 3,/s 

(7.4) Z>(0, 0 ,) =- m - -|| r - {Na 

« — 1 

A procedure similar to that used to derive (5.19) yields for the simultaneous 
distribution law of 

(7.5) = dr/e (r = 1, 2, • • • , k - 1) 

P(h, * 2 , - • • , h-i) = l / ( /r~~ k i)% (1 - h - -- -i)'"*-"* 


n 


r(W~- T 


(7.7) 


where the limits of variation in (7.(3) are 20 
(0 ^ h ^ 1 

(0 g g 1 - - h - ■■■ - h -1 , (r = 2, • • • , k - 1) . 

In a manner similar to the derivation of (6.6) we find the distribution law of 
hi =*.,(« = 1, 2, ... , fc - 1), hi = 1 - - i* - ■ ■ ■ - to be 


(7.8) 


D(h 2 a ) 


T(N - k )/2 


r (Nl - l)/2 V(N — k — N a + l)/2 

(^)(.V„-8)/2 (J _ /,2)<N- A-Afo,- l)/2 


(« = 1, 2, • • • , fc) . 


From the distribution law in (7.3) we readily obtain that the characteristic 
function of the distribution law of yl = log (0 a /(0o - 0) is given by 

Q x , rt r(JV Q - 1 + 2 *0/2 F (k - 1 - 2it)/2 ( n ... 

(7.9) *(0 =- l\N a ^T)72fTk^l)72 ‘ A 


20 Cf. J. Neyman and K. S. Pearson, loc. cil.j 1. 





212 


SOLOMON ROLLBACK 


We thus have that the distribution law of yl is given by 


P(yl) = 


(7.10) 


r (N a - i)/2 r(fc - i)/2 2 x 

J°° e' ' y “ r(N a - 1 + 2 * 0/2 r(k-l- 2 * 0/2 <U . 

The integral in (7.10) is known , 21 and there results 

(7 11) PC y 2 )_r(W a + t - 2)/2 _ >a ( ‘ v “^ 1)/2 /J , V*\-CAT a +t-2)/2 

U ~ T(Na - l)/2 r(A - l)/2 V + / 

y ^ 

If we set e “ = 0 a /(0o — d) — \l we have for the distribution of X« 

(7.12) D(\l) =-"-3l A , 2 Vg_( X 2)(JV a -3)/2 (1 , X 2\-(. Va+ t-2)/2 

w ' K aJ l\Na - l)/2 r(fc - l)/2 K a> + 

An extension of the procedure used to obtain (7.9) yields as the characteristic 
function of the simultaneous distribution of 7 ?, y\, • ■ • ,y\ 

<p(t lt tt , • • • ,tk) = - ^ 2 — 2 

** ’ ’ *' r(fc - l)/2 

(7.13) 

n l’(JV„ - 1 + 2it a )/2 

_ ~ l\N a - l)/2 


a-1 


Successive application of the method used to evaluate (7.10) yields as the simul¬ 
taneous distribution law of the 


P(yl 77 i) = ZtS (1 + t>T ‘ + • • • + 


(7.14) 


n 


e i'i(Af«-i)/2 


L T(N a - l)/ 2 ' 


The simultaneous distribution of the X| defined as in (7.12) is given by 


(7.15) 


D(\\, X 5 , • • • , X?.) — ~~ l)/2 (^ + '‘ + ^*) ' 


0/2 


(X 2 )(V a - 3„2 

1 = 1 l\N a - l)/2 ' 


81 Whittaker and Watson, Zoc. ci(., pp. 283, 383. 



SAMPLES FROM MULTIVARIATE NORMAL POPULATION 


213 


8. Conclusion. In this paper we have presented further instances of the 
applicability of the theory of characteristic functions to the distribution problem 
of statistics. In a subsequent paper the author hopes to illustrate the applica¬ 
tion of the results here developed to specific numerical problems. 

Washington. D. C. 



ON A CRITERION FOR THE REJECTION OF OBSERVATIONS AND 
THE DISTRIBUTION OF THE RATIO OF DEVIATION TO 
SAMPLE STANDARD DEVIATION 

By William R. Thompson 

Criteria for the rejection of outlying observations may be designed to reject a 
given fraction of all observations, or a proportion varying with the size of the 
sample. Irwin 1 has discussed several criteria based on sampling from a normal 
population which had been used previously, as well as one which he proposed. 
This is based on the principal of fixing the expectation of rejecting an observation 
from a sample independently of the aggregate number, N, of the sample. The 
criterion, X, is 1 /<t times the interval between successive observations in ascending 
order of magnitude, where a is the standard deviation of the sampled population. 
In the same paper he gave, for different values of N , a table of Pi(X) and P 2 (X), 
respectively probabilities of exceeding given values of X for the first or second 
such interval from either end. In actual use, however, o is estimated from the 
sample standard deviation, and we are left to decide whether observations in 
question are to be included or not in estimating the standard deviation as also 
whether or not to modify this by addition or subtraction of an estimate of its 
probable error. The object of the present communication is to develop a 
criterion free from defects of this nature, depending only on the assumption of 
random sampling from a normal universe. For this purpose we develop the 
distribution of r defined by 

( 1 ) 


where s is the sample standard deviation and 8 is the deviation of an arbitrary 
observation of the sample from the sample mean. This leads to definite criteria, 
which are simple in application. 

Accordingly, consider a sample {#»}, i — 1, • • • , N y to be drawn at random 
from a normal population of unknown mean and standard deviation, and that 
the order of enumeration is arbitrary. Then x N is an arbitrary one of the ele¬ 
ments or observations . Now, let 


( 2 ) 


i N 

= 1. V 
N 


Xi , 



fa - x y 


( 3 ) 


8 == xn — x . 
214 


N 


and 



CRITERIA FOR REJECTION OF OUTLYING OBSERVATIONS 


215 


Then we will prove that the distribution of r s 8 /s in repeated sampling with a 
fixed aggregate number, N , is given by substitution of 

y/ri'Z = t = Vn-r/V n + 1 — r 2 


in the z or t distribution of “Student” and R. A. Fisher, 2 where n = N — 2. 
To this end let N > 2, and let n = iV — 2, and 

(4) (n -f l)xi = ^ i,, and £i(x — xO 2 = (x» — xO 2 . 


Obviously, the (n + l)xi + xn — N-x, whence 
x N — x 8 


(5) 


X — Xi = 


whence 


w -f* 2 

x* — Xi = —— •$. 


-—- - ■ M UVJ11VV. WJ. . 4 

n + 1 n + 1 n + 1 

Furthermore, N-s* = Si(x — xi) 2 + (n + l)(xi — •*')'" 4- (xy — x) 2 , whence 


( 6 ) 


Ns 2 = Siix - x,) 2 + ^L±2.« 2 . 

n + 1 


Now, considering the separate samples, fx*}, i = 1> • * * » N — 1, and {x*}, 
of aggregate number, AT — 1 and 1, respectively; Fisher has shown 2 that if we 
set 


(7) 


( xy - Xi) • Vn /n + 1 
V Si(x — xi) 2 r n + 2 * 


then, for <o > 0, the probability, p, that t < to is 



and P = 2(1 — p) is the probability that U | > t„. 
Now, (5) and (6) in (7) give 


(9) 


t = 


n + 2 

n + 1 


V U + 2) (*‘ ~ n + l) 


v V n 


(n + 1) 


y/ n 


+ 2 \/n + 1 


whence 


( 10 ) 


r 


= t 



n + 1 
n + t 21 


T 

or —7 -— 

\/n + 1 


sin 6 , 


tan 6 — 


t 

y/ n 


z . 


Accordingly, P is the probability that | r | > to — to y ^ ^2 * 

Thus, if we want to determine r 0 so that by rejecting all observations deviat- 
ing from the sample mean by more than s-r 0 we shall have an average relative 



216 


WILLIAM R. THOMPSON 


frequency of rejections per sample which is fixed, say <£; then we need only 
to set P = <t>/N. This follows at once from the hypothesis as a; is a random 
element of the random sample of N elements drawn from the same normal 
universe (of unknown mean and standard deviation). The criterion of re¬ 
jection, s • To, is uniquely determined from the sample standard deviation and 


TABLE I 


N 

r for given <t> 

t for given 4> 

n 


<t> - 0.2 

0.1 

0.05 

02 

01 

0.05 


3 

1.40646 

1.41228 

1.41373 

9.51 

19.08 

38.19 

1 

4 

1.6454 

1.6887 

1.7103 

4.30 

6.20 

8.84 

2 

5 

1.791 

1.869 

1.917 

3.48 

4.54 

5 84 

3 

6 

1.895 

1.997 

2.067 

3.19 

3.97 

4 84 

4 

7 

1.973 

2.093 

2.182 

3.04 

3.68 

4.38 

5 

8 

2.041 

2.170 

2.274 

2.97 

3.51 

4.12 

6 

0 

2.099 

2.237 

2.348 

2 93 

3.42 

3 94 

7 

10 

2.144 

2.295 

2.413 

2.89 

3.36 

3 83 

8 

11 

2.190 

2.343 

2.472 

2.88 

3 31 

3 76 

9 

12 

2.229 

2.388 

2.521 

2.87 

3 28 

3 70 

10 

13 

2.262 

2.425 

2.567 

2 86 

3 25 

3 66 

11 

14 

2.296 

2.463 

2 598 

2 86 

3 24 

3 60 

12 

15 

2.325 

2.497 

2 636 

2 86 

3 23 

3 58 

13 

16 

2.357 

2.522 

2 670 

2 87 

3 21 

3.56 

14 

17 

2.382 

2 553 

2.699 

2 87 

3 21 

3 54 

15 

18 

2.404 

2 576 

2 733 

2 87 

3 20 

3 54 

16 

19 

2.429 

2 601 

2 759 

2 88 

3 20 

3 53 

17 

20 

2.448 

2.625 

2 783 

2 88 

3 20 

3 52 

18 

21 

2.471 

2 647 

2 800 

2 89 

3 20 

3 50 

19 

22 

2 487 

2.661 

2 819 

2 89 

3 19 

3 49 

20 

32 

2 636 

2 819 

2 985 

2 944 

3 216 

3.479 

30 

42 

2.737 

2 925 

3.093 

2 991 

3 248 

3.489 

40 

102 

3.047 

3 233 

3 407 

3 182 

3 397 

3 603 

100 

202 

3.266 

3.448 

3.621 

3 347 

3 546 

3.736 

200 

502 

3.528 

3 704 

3.872 

3 569 

3.752 

3.927 

500 

1002 

3 714 

3 881 

4 047 

3 737 

3 908 

4 078 

1000 


P * 0/iNT. 

Note: r is computed to 0.5 unit in the last place given from the given t which is believed 
correct to 1 unit in the last place. 


number of elements, N , for any prescribed <t>. Dropping the subscript, criti¬ 
cal values of r are given in Table I (together with corresponding values of t) 
for 4 > = 0.2, 0.1, and 0.05 and values of n s N — 2 which should be sufficient 
for most practical purposes. The normal deviate (for unit standard deviation 
and the same P) lies between these values and is approached by r and t (in the 




CRITERIA FOR REJECTION OF OUTLYING OBSERVATIONS 


217 


tabulated range of <t>) from opposite sides as n increases, the approximation to r 
being the closer of the two. Accordingly Sheppard's tables may be used with 
good approximation for n > 1000, with <f>/N = P, the probability of exceeding 
numerically the given deviate. They may be used to advantage also in inter¬ 
polation between n = 100, 1000 by means of differences at the tabulated 
points. 

A crude rejection system where we reject an observation if it deviate from the 
mean of all others by more than a fixed constant times the standard deviation of 
such a difference in terms of a as estimated from the variance of these others by 


a 




amounts to taking a fixed value of t as criterion. 


The 


intention is usually to fix the probability (P) of rejection of observations rather 
than the expectation of rejections per sample (</>); and this, of course, is the 
expected approximate result for large samples. For small samples, however, say 
4 < N < 32, by rejection of observations deviating thus by more than 


3 


• v' w N ~v il ■ 


appears from (7) and Table I that approximately <f> would 


be fixed rather than P. 

The r-criterion not only affords a precise extension of such a rejection system, 
but also a reduction of the actual process of application to a minimum, with one 
noteworthy exception for the case, N = 3. Here we may use as criterion with 


d 

identical effect the ratio, where x 2 ^ ^ 3 , d 2 = x 3 — x 2 , di = x 2 — x h and 

d\ 

d 2 ^ d\. This order can always be adopted for the test, and it is readily verified 
that 


(ID 


d t V.3 • l - 1 
d\ ' 2 


whence for <f> = 0.2, 0.1, and 0.05, respectively we have ~ ~ 7.74,16.0, and 32.6. 

(tl 

Thus, for N = 3, we may take merely the ratio of the greater to the other 
numerical deviation from the median observation as criterion. 


Section 2 

Although not required in connection with the rejection criterion developed 
above, there is a simple generalization of r with a closely related distribution 
which may be valuable in somewhat different circumstances. Consider the same 
situation as given above, except that {x t j is divided into two subsets, where 
i = 1 , • • • , N — k, and i = N — k + 1 , • • • , N, respectively; giving two 
random samples of aggregate number, N — k and k. Let the means of these be 
xi and x 2j respectively; and s and x be as before. Then in general let 

8 5= J *2 — .1 and r = -. 

6 ‘ 


( 12 ) 



218 


WILLIAM R. THOMPSON 


TABLE II 

T ( P , N , 1 ) 


N 

P - 0.9 

08 

07 

06 

05 

04 

P - 03 

02 

0.1 

0 05 

0 02 

0.01 

N 

3 

221 

437 

643 

832 

1 000 

1 144 

1 260 

1 3450 

1 3968 

1 4099 

1 41352 

1 414039 

3 

4 

173 

347 

520 

693 

866 

1 039 

1 212 

1 386 

1 559 

1 6080 

1 6974 

1 7147 

4 

5 

158 

316 

476 

639 

808 

983 

1 170 

1 374 

1 611 

1.757 

1.869 

1 9175 

5 

6 

.149 

300 

453 

612 

.777 

.952 

1 143 

1 300 

1 631 

1.814 

1 973 

2.0509 

6 

7 

144 

290 

440 

.594 

.757 

932 

1 125 

1 349 

1 640 

1 848 

2 040 

2.142 , 

7 

8 

.141 

284 

431 

583 

744 

918 

1 111 

1 340 

1 644 

1 870 

2 087 

2.207 

8 

9 

139 

280 

425 

.575 

734 

907 

1 102 

1 334 

1 647 

1 885 

2 121 

2 256 

9 

10 

.137 

.276 

420 

569 

727 

899 

1 094 

1 328 

1 648 

1 895 

2 146 

2 294 

10 

11 

136 

.274 

416 

.564 

.721 

893 

1 088 

1 324 

1 648 

1 904 

2 166 

2 324 

11 

12 

.135 

272 

413 

.560 

.717 

.888 

1 083 

1 320 

1.649 

1 910 

2 183 

2 348 

12 

13 

.134 

270 

411 

557 

713 

884 

1 080 

1 317 

1 049 

1 915 

2 196 

2 368 

13 

14 

134 

269 

408 

554 

710 

881 

1 076 

1 314 

1 649 

1 919 

2 207 

2 385 

14 

15 

133 

268 

407 

552 

.707 

878 

1 073 

1 312 

1 649 

1 923 

2 216 

2 399 

15 

16 

.133 

267 

405 

550 

705 

875 

1 071 

1 310 

1 649 

1 926 

2 224 

2 411 

16 

17 

.132 

.266 

404 

548 

703 

.873 

1 069 

1 309 

1 649 

1 928 

2 231 

2 422 

17 

18 

.132 

.265 

403 

547 

701 

871 

1 067 

1 307 

1.649 

1 931 

2 237 

2 432 

18 

19 

131 

264 

402 

546 

699 

869 

1 065 

1 305 

1 649 

1.932 

2 242 

2 440 

19 

20 

131 

264 

401 

544 

698 

.868 

1 063 

1 304 

1 649 

1 934 

2 247 

2 447 

20 

21 

130 

263 

400 

543 

697 

.867 

1 062 

1 303 

1 649 

1 936 

2 251 

2 454 

21 

22 

.130 

.263 

399 

542 

606 

865 

1 061 

1 302 

1 649 

1 937 

2 255 

2 460 

22 

23 

.130 

262 

398 

.541 

695 

804 

1 059 

1 301 

1 649 

1 938 

2 259 

2 465 

23 

24 

.130 

.262 

.398 

541 

694 

863 

1 058 

1 300 

1 649 

1 940 

2 262 

2 470 

24 

25 

.130 

.261 

397 

540 

693 

862 

1 057 

1 299 

1 649 

1 941 

2 264 

2 475 

25 

26 

.130 

.261 

397 

539 

692 

861 

1 056 

1 299 

1 648 

1.942 

2 267 

2 479 

26 

27 

129 

.261 

.397 

538 

691 

860 

1 056 

1 298 

1 648 

1 942 

2 269 

2 483 

27 

28 

129 

.261 

396 

.538 

691 

860 

1 055 

1 297 

1 648 

1 943 

2 272 

2 487 

28 

29 

.129 

260 

.306 

537 

690 

856 

1 054 

1 297 

1 648 

1 944 

2 274 

2 490 

29 

30 

.129 

260 

395 

537 

690 

859 

1 054 

1 296 

1 648 

1 944 

2 275 

2 493 

30 

31 

.129 

.260 

.395 

536 

689 

858 

1 054 

1 296 

1 648 

1 945 

2 277 

2 495 

31 

32 

129 

.260 

394 

530 

.689 

858 

1 053 

1 295 

1 648 

1 945 

2 279 

2 498 

32 

00 

.12566 

.25335 

38532 

.52440 

.67449 

84162 

1 03643 

1 28155 

1 64485 

1 95996 

2 32634 

2 57582 

OO 


Note: r {PtNtk) m * r ™> 


Further, let n\ -f 1 — N — k } n 2 + 1 = k, S\(x — a) 2 be the sum of squared 
deviations in the first sub-sample and similarly 82(2 — a) 2 be that for the 
second. Then Fisher has shown 2 that the generalized 

^ f = _ n l _ A A( Ui + 1 ) ( ^2 + 1 ) 

V/Si(a- - .Pi ) 2 + /S 2 (.c - ,f 2 ) 2 ' Wi + m 2 + 2 
is distributed as before for n = n\ + n 2 . Obviously, 

N-x = (m + 1)a + ( ri 2 + 1)X2 , 


whence 



CRITERIA FOR REJECTION OF OUTLYING OBSERVATIONS 


219 


(14) 

and 


t _ + 1) (•*'* — J'l) _ (wi -f- 1) (•? — .?i) 

N 1 JiT+1 


(15) 8i( x - a) 2 + Stir - x t Y = N ■ s 2 - (m + !)(?!- .r) 2 


whence 


= N (tS 2 - 


+ 1 

ni + i 



(n 2 +1) fa - xY 


(18) where n = N — 2 ) 

i.e., t = Vn • tan 6 , Vn + 2 — k • sin 0 = y/k • r. 


In connection with analysis of variance where the total sample may be divided 
into several subsets of observations, the generalized r may be used, accordingly, 
to indicate in a simple manner which (if any) of the means of subsets differ 
significantly from the general mean where the equivalent J-test is applicable. 

In general let t (P , N ,k) ^ 0 be a number such that P is the probability that 
/r/ > rep. at.A),* where, as above, N is the total number of observations in the 
whole sample, k is the number of these in the subsample and r is defined by (12). 
Then by (16), obviously, 


(17) riP'N.k) = y fiyfzr j) * T(p '" ,i) • 


In Table II are given values of t ( p, n , d for a range of values of the arguments, N 
and P. The critical values of r in Table I are simply values of this function for 
P — <t>/N where <£ is taken as parameter, i.e., t^i N ,n. d. 

Rider 3 has given an interesting review of rejection criteria previously proposed. 


BIBLIOGRAPHY 

1. Irwin, J. O., Biometrika , 17 (i925), pp. 100-128; 17 (1025), pp. 238-250. 

2. Fisher, R. A., Me Iron, 5, (1925), No. 3, pp. 90-104,109-112. 

“Student,” Metron , 5, (1925), No. 3, pp. 105-108, 113-120. 

3. Rider, Paul R., St. Louis, Washington University Studies, new series, Science and 

Technology, No. 8 (1933), 23 pp. 


Yale University. 




ON CERTAIN COEFFICIENTS USED IN MATHEMATICAL STATISTICS 

By Everett H. Larguier, S.J. 

I. Introduction 

(1.1) We have studied here certain coefficients arising in interpolation, numeri¬ 
cal differentiation and integration formulas in order to establish explicit expan¬ 
sions for these coefficients in the form of a finite summation. Ordinarily they 
are obtained by means of recursion relations, which necessarily demand the 
building up of a complete table in order to find the desired set of coefficients. By 
using the methods described in this paper, we are able to calculate any desired 
set independent of the ones which precede it in the table. In the literature we 
find two other expansions of the difference quotients of zero , one by Jeffery 1 and 
one by Boole. 2 Our expansion for the differential quotients of zero is the same as 
one obtained by Jeffery, 3 however the proof is more elementary and simple. 

The Bernoulli numbers also find a wide range of application in many finite 
integration formulas, and hence our attention was drawn to the discussion of 
certain coefficients which occur in the study of these functions. 4 As in the cases 
mentioned above these coefficients are likewise ordinarily obtained by recursion 
formulas, but by our expansions they may be obtained directly. 

II. Difference Quotients of Zero 

(2.1) It is our purpose here to show that this difference quotient of zero, A m 0 n , 
may be expressed by the following summation: 



where a\ y a 2 , • • • , a m -\ = 0, 1, 2, • • • , n — m and a\ ^ a 2 ^ ^ a m -i ^ 0. 

Obviously the number of terms in the summation is the number of combina¬ 
tions of n — m + 1 things taken m — 1 together where repetitions are allowed. 

(2.2) By means of the recursion relation 5 

A m 0 n = m A m 0 n “ l + m A m ” 1 0 n ~ l (2) 

1 Henry M. Jeffery, “On a method of expressing the combinations and homogeneous 
products of numbers and their powers by means of differences of nothing.” Quarterly 
Journal of Pure and Applied Mathematics, vol. 4 (1861), pp. 364 ff. 

* George Boole, A Treatise on the Calculus of Finite Differences , (Stechert, N. Y.), p. 20. 

3 Loc. cit. 

4 Steffensen, Interpolation (Williams & Wilkins, Baltimore), p. 125. 

6 L. M. Milne-Thompson, Calculus of Finite Differences f (Macmillan), p. 36, sec. 2.53, (2). 

220 



COEFFICIENTS USED IN MATHEMATICAL STATISTICS 


221 


we are able to build up a table of values. By substitution it can be shown that 
(1) satisfies the values of this table except when m = 0, 1 and for m > n, for 
then the summation becomes meaningless. We therefore define the summation 
to have the value 0 for m = 0, n > 0 and for m > n, and the value 1 for m = 1. 
We exhibit one substitution below. When m = 3 and n = 4, 



(2.3) Taking (2), we proceed by repeated application of the recursion formula 
and finally we have 

n — 1 

A m 0 n = m n ~ m A m 0 m + Yi m n ~ d A m ' l 0 d , 

d “m 

which since A m 0 m = m!, 6 becomes 

w — 1 

A m 0 m = m n ~ m m\ + £ m n ~ d A’ n ~ 1 0 d . (3) 

c/ = m 

We will now prove (1). Proceeding by induction we assume (1) true for 
m — 1. Hence from (3) we have 



where a lf a 2 , • • • , a m _ 2 = 0, 1, 2, • • • , d — m + 2 and ai ^ a 2 2^ • • - ^ 
a m _ 2 0. This becomes 


n — 1 

A^h = m n ~ m m ! + m! ^ w n " d_1 

d =m 


/m — l\“n.-2 



(4) 


Using the symbol 22 for the double summation of (4), we may write 

22 - 2 {(^)‘ ■ - - m + (fe” ■' 


+ 

+ 

+ 


/m/ m - 2 Y~ m ~ H 
\m — 2/ \m — 3/ 


2 


r 


=2fer-(ir(f)^ 


* Milne-Thompson, loc. cit. 



222 


EVERETT H. LARGUIER 


+ 


m — 

V 

m 

\ 

rn — 

i/ 

m 

\ 

jn — 

i/ 

f m 



varies 


Nw , ... (0™ (?)-, a „ d also , 

from m to n — 1. Hence by including m n ~ m under the summation we are 

able to replace the double summation by a single one and have 




where a h <z 2 , • • • , a w -i - 0, 1, 2, • • • , n — ra and a\ ^ a 2 ^ ^ a m - 1 ^ 0. 

Hence (1) is proved. 7 


III. Differential Quotients of Zero 

(3.1) In Markoff's formula for numerical differentiation we meet coefficients 
of the type D m 0 (w) . We will show here that this differential quotient of zero 
may be expressed by the following finite sum: 

D m 0 (w) = ( — l) n ~ m m\ 53 (P 1 P 2 • • • Pn-m) (5) 

where pi > p 2 > • • • > p n -m > 0 take on values from 1, 2, • • • , n — 1. Obvi¬ 
ously the number of terms in the expansion will be the same as the number of 
combinations of n — 1 things taken n — m together without repetitions. 

(3.2) By means of the recursion formula 8 

D”Wn) = (1 _ n ) Dmtyn-l) + m ftm-ltyn- 1) (fl) 

we are able to build up a table of values. By substitution it can easily be shown 
that (5) satisfies the values of the table when n > m > 0. For the other values 
the summation is meaningless, hence we define it to have the value 1 for 
m = n > 0; and the value 0 for m > n and m = 0. When m = 2 and n = 4, 
we have 

ZW 4 > » (- 1) 4 ~ 2 2! {(3-2) 4- (3-1) + (2-1)} = 22, 
which is the same value as found by (6). 

7 Our expansion may be shown to be equal to that of Jeffery’s cited in the introduction, 
which is A m 0 w+n = ml £ m 0 m+n , where (■"•O'" 4 n expresses the sum of all the homogeneous products 
of n dimensions which can be formed by the first m natural numbers and their powers. The 
proof of Jeffery’s expansion involves the use of complicated symbolic operators, while our 
proof uses elementary notions only. 

8 Steffensen, op. eit., p. 57, 58, (12) and (14). 



COEFFICIENTS USED IN MATHEMATICAL STATISTICS 


223 


(3.3) Returning to (6), we obtain by its repeated application: 

n ~m— 1 

D m 0 l n) — (_ l)»-m ( n _ J)rnQ(m) -f W (_ 1)“ (« — 1)<°> £)«-10 (»-«-» 

a =- 0 

or, since D m 0 (m) = m!, 

n —m— 1 

D m 0(») = (_ l)n—m _ !)(»-«.) W J _|_ m £ (_ !)a ( n _ ^(o) jr)m-l 0 (»-a-l> (7) 

a = 0 

In proving (5), we proceed by induction, assuming (5) true for m — 1; hence 
by (7) we have 

2)m0(n) = (_ ( n _ m J 

n-m-1 (g) 

+ All iC (— l) n_m (n — l) (a) * * • Vn-m-a) 

«-0 

where pi > P 2 > • • • > p n - m -a > 0 take the values 1, 2, • •. , n — a — 2. 
Expanding the double sum of (8) we have 

n — 2 n — 3 

US = E (Pi--- Pn-m) + E (« ~ 1) (Pi • ' • Pn-m-l) 

» — 4 

+ E (» - 1) (w - 2) (pi • • • p„_„_ 2 ) (9) 

Pi -\ 

wt — 1 

+ • • • + E ( n - 1) (« - 2) • • • (»* + 1) (pi) 

x.-i 

in which pi > P 2 > • • • > > 0 always holds, where 

s = n — m, n — m — 1 , • • , 2 , 1 

in turn. 

Upon inspection, it is evident that (9) contains all the terms of (5) with the 
exception of (n — 1) (» — 2) • • • (m + 1 )m. Hence, since by definition 
(n — l) (w “ m) = (n — 1) • • • (m + 1 )m, we may include the first term on the 
right-hand side of (8) under the summation and then we have proved (5). 9 

IV. The Coefficient G ( n r) 

(4.1) In discussing the Bernoulli numbers and the Bernoulli polynomials, 
Steffensen 10 makes use of the relation: 


Btr(x) = (- l) r E Gi r) 2 r_n (10) 


• Jeffery’s expansion referred to in the introduction is D" 1 0 (n) = ^"0 (n) , where-^ j- 

expresses the sum of the combinations of the first n — 1 natural numbers taken n — m 
together. The remarks made above under article 2.3 concerning symbolic operators also 
apply here mutatis mutandis. 

10 Op. cit., p. 125, (24); cf. also Jacobi’s theorem. Journal fur reine und angewandte 
Mathematik (Crelle’s Journal), vol. 12, pp. 268-269. 



224 


EVERETT H. LARGUIER 


where z = x — x 2 . We wish here to show that the coefficient G ( n r) , ordinarily 
found by means of recursion formulas, may be obtained from the following 
summation: 

r—n + 1 ATn + 1 JVi+1 

Gi r) = (2r) (2B) E UVJ E M-il ••• E Ml (11) 

JVn-3 A^m^i-3 JVi“3 

where [N] = (A0 (2) /(2A0 <4) . Obviously the summation has no meaning for 
n = 0, nor for r < n + 2. Therefore it will be necessary to make definitions 
or devise other schemes for meeting this difficulty. 

Steffensen 11 shows that 

G ( 0 r) =1 for r ^ 0; G ( r -i = 0 for r > 1; (12) 

and likewise he gives the following recursion relation: 

(2r - 2n) (2) G ( n r) = (2r) (2) G ( n r “ l) + (r - n + 1)< 2) G^. (13) 

In accordance with (12), we define the sum of (11) to be equal to 1 for n = 0, 
and to be equal to 0 for n = r — 1, when r > 1. By means of the recursion 
formula (13), Steffensen 12 gives a table of values of G ( n r) , which (11) may be 
easily shown to satisfy. From this table we have the value G ( 8 6} = 10. Using 
this as an example of the expansion, we have by (11): 

Ni ■+• 1 *2 + 1 

g ( 3 ,) = (12) (6) E [n 3 ] E Ml E Ml 

JVj-3 Ni” 3 N i-3 

= (12) <8) <[3]{[4]([5] + [4] + [3]) + [3]([4] + [3])} 

+ [4){[5]([6] + 15] + [4] + [3]) + [4]([5] + [4] + [3]) + (3]((4] + [3])}> 

= 10. 

(4.2) Before proving the general case, we will prove by induction that 

G[ r) = (2r) <2) E Ml (14) 

ATi-3 

Assuming (14) true for r — 1, we have by (12) and (13) 

&\ r) = (2r) (2) E m + (2r) <2> [r] = (2r) (2 > E [N,]. 

Hence (14) is valid. 

(4.3) We shall prove (11) with respect to r. By repeated application of (13), 
we have 

11 Op. cit., p. 125. 

“ Op. cit., p. 126. 



COEFFICIENTS USED IN MATHEMATICAL STATISTICS 


225 


<? ( » r) = {(2r)»/(2r- 2n) m )G ( :~ u + {(2r) (2 >(r -n + l) (2 >/(2r-2n + 2) (4) )G ( n r r 1 l > 
+ {(2r) C2) (r — n + 1) <2> (r - n + 2) (2) /(2r — 2 n + 4) <6) } G ( „T 2 l> + • • • 
+ {(2r) (2) (r - n + 1) (2) • • • (r - l) (2) /(2r - 2) (2,l) } G\ r ~ l) 

+ [0 - n + 1) (2) • • • (r) (2) /(2)' - 2) (2n) |G < 0 ' > 

r—n JVj + l 

= (2r) (2 " > £ [JV n ] ... £ uvj 

JVn = 3 tfi-3 

r — n-H 1 A T 2 -f 1 

+ (2r) <2,,) [r - n + 1] E [tf_J ••• £ [AM 

r —n-f 2 AT 2 + 1 

+ (2r) (2n) [r - n + 1] [r - n + 2] £ [AT*-*] • • • £ [AM + • • • 

Nn t *** 3 JVi =* 3 

+ (2r) (2n) [r - n + 1] [r - n + 2] • • • [r - 1] £ [AM 

JVi -3 

+ (2r) (2 '° [r - n + 1] ■ • • [r]. 

It is evident from inspection that this is nothing but an expanded form of (11), 
hence (11) is proved with respect to r. 

(4.4) Proceeding in the same way as above to prove induction with respect to 
n, we have again by repeated application of (13) 

G ( „ r) = {(r - n + l) <2) /(2r - 2n)®}^ + {(2r) (2) (r - n) m /(2r - 2n) (4) }fc n 
+ {(2r) (4) (r - n - l) (2) /(2 r - 2«) (6) J G ( n T S* ) 

+ • • • + {(2r) (2 ’~ 2 '‘- 4) (3) (2) /(2 r - 2n) (2r_2n_2) | (r',”_ + 1 2) 

r —n - f- 2 Ni +1 

= (2r)« B) [r - n + 1] £ [AMM £ [AM 

JV n _t~3 ATi *=* 3 

r — n -f-1 A '2 -4-1 

+ (2r) (2n> [r - n] £ [AM-i] • • • £ [.V,] 

N, i-i=3 Afi =3 

5 ATa + 1 

+ ... + (2r) <2 " > [4] £ [AT.-J ... E Ml 

ATn-1 “ 3 *1=3 

4 *2 + 1 

+ (2r) <2 '° [3] E [*«-i] • • • E [AM- 

*„-1 “3 *j — 3 

From this latter equation, (11) follows immediately and therefore the proof is 
complete. 

(4.5) Bernoulli numbers may be expressed in terms of this coefficient (7 ( n r) , as is 
shown by Steffensen, 13 in the following way 

B 2r = ( —l) r <? ( /> (15) 


13 Op. cit., p. 125, (27). 



226 


EVERETT H. LARGUIEK 


which we shall express in terms of (11). However as (11) is meaningless for 
n = r, we obtain the relation 

(2r + 2) <2) G ( r r) = -(2) <2) G ( /-Y ) for r > 0, (16) 

which follows immediately from (12) and (13), and thereby obviate this difficulty. 
Hence, by (11), (15) and (16), we can write 

3 *Vr-1 4-1 tf 2 + l 

B 2r = {( —l) ( ' +1) (2r)!/(4) <2 ’j £ [*,_,] £ [* r _ 2 ] • • • El*,] (17) 

N r 1="3 * r 2=*3 *1^3 

We note here that the definitions of the summation, given in 4.1, likewise hold. 

Saint Louis University 
Saint Louis, Missouri 



NOTICE OF THE ORGANIZATION OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 

For sometime there has been a feeling that the theory of statistics would be 
advanced in the United States by the formation of an organization of those per¬ 
sons especially interested in the mathematical aspects of the subject. As a con¬ 
sequence, a meeting of interested persons was arranged for September 12,1935, at 
Ann Arbor, Michigan. At the meeting, it was decided to form an organization 
to be known as the Institute of Mathematical Statistics. A constitution and 
by-laws were adopted and the following officers elected to serve until December 
31st, 1936: President, H. L. Rietz; Vice-president, W. A. Shewhart; Secretary- 
Treasurer, A. T. Craig. A resolution, instructing the officers to investigate the 
feasibility of the affiliation of the Institute with the American Mathematical 
Society or with the American Statistical Association, was adopted. 

The constitution provides that membership in the Institute shall consist of 
Members, Fellows, Honorary Members, and Sustaining Members. A com¬ 
mittee on membership will establish qualifications requisite for the different 
grades of membership. The annual dues of members and fellows are five dollars 
a year and these include a year’s subscription to the official journal, the Annals 
of Mathematical Statistics. 

The next meeting of the Institute will be held in St. Louis, Missouri, in 
December of this year in connection with the meetings of the American Associa¬ 
tion for the Advancement of Science, the American Mathematical Society, and 
other organizations. 

Forms for application for membership in the Institute may be had by writing 
the Secretary-Treasurer at the University of Iowa, Iow a City, Iow T a. 


227 



1935 DIRECTORY OF SUBSCRIBERS TO THE ANNALS OF 
MATHEMATICAL STATISTICS 

INDIVIDUALS 

Acebboni, Dr. Abgentino V., Banfield, Larroque 232, U. T. 94, Argentine. 

Armstrong, Charles M., Jr., 1338 Dean Street, Schenectady, N. Y. 

Anderson, Robert, 1104 New Federal Building, St. Paul, Minn. 

Aroian, Leo. A., Colorado State College, Fort Collins, Colo. 

Bachelor, Robert W., 1437 Bancroft Way, Berkeley, Calif. 

Bailey, W. B., The Travelers Insurance Company, Hartford, Conn. 

Baten, Prof. W. D., University of Michigan, Ann Arbor, Mich. 

Beale, Dr. Frank S., 316 West Packer Avenue, Bethlehem, Penn. 

Been, Richard, United States Bureau of Agricultural Economics, Washington, D. C. 
Black, Loren T., R. F. D. No. 3, Peru, Indiana. 

Blackadar, W. L., Assistant Actuary, Equitable Life Assurance Society, New York City. 
Blackburn, Prof. Raymond F., University of Pittsburg, Pittsburg, Penn. 

Borgward, Prof. F. Wm., College of Business Administration, Syracuse University, 
Syracuse, N. Y. 

Brown, Prof. Theodore II., School of Business Administration, Harvard University, 
Boston, Mass. 

Bosch, J., Barcelona RDA Universidad, Barcelona, Spain. 

Brumbaugh, Prof. M. A., Bureau of Business and Social Research, University of Buffalo, 
Buffalo, N. Y. 

Bushey, Prof. J. Hobart, Hunter College, New York City. 

Carmichael, John, 2 Aldridge Road, Villas, London W. 11, England. 

Carver, Prof. H. C., University of Michigan, Ann Arbor, Mich. 

Chapman, Dr. Dwight W., Emerson Hall, Cambridge, Mass. 

Chapman, Roy A., Southern Forest Experimental Station, New Orleans, La. 

Cliffe, F. B., General Electric Company, Schenectady, N. Y. 

Coggins, Paul P., Statistician, New Jersey Bell Telephone Company, Newark, N. J. 
Coover, Prof. J. E., Department of Psychology, Stanford University, Calif. 

Craig, Prof. AlanT., University of Iowa, Iowa City, Iowa. 

Craig, Prof. C. C., University of Michigan, Ann Arbor, Mich. 

Crathorne, Prof. A. R., University of Illinois, Urbana, Ill. 

Cureton, E. E., Associate Professor of Education, Alabama Polytechnic Institute, Auburn, 
Alabama. 

Dade, E. B., 116 W. Administration Building, Lawrence, Kansas. 

Dodd, Prof. Edward L., University of Texas, Austin, Texas. 

Dorweiler, Paul, Actuarial Department, Aetna Life Insurance Company, Hartford, Conn. 
Dunlap, Jack W., Fordham University, New York City. 

Elston, James S., Travelers Insurance Company, Hartford, Conn. 

Evans, Herbert P., Mathematics Department, University of Wisconsin, Madison, Wise. 
Fertig, Dr. John W., Assistant Biometrician, State Hospital, Worcester, Mass. 

Fischer, Dr. CarlH., Mathematics Department, Wayne University, Detroit, Mich. 
Fletcher, R. C., Director, Bureau of Social Research, Federation of Social Agencies, 
Pittsburg, Penn. 

Frecheville, G., Windmill Hill Farm, Steeple Claydon, Bucks, England. 

228 



DIRECTORY OF SUBSCRIBERS 


229 


Fritze, C. E., Fredagatan 2, Stockholm, Sweden. 

Glover, Prof. J. W., University of Michigan, Ann Arbor, Mich. 

Greville, Dr. Thos. N. E., Actuarial Department, Acacia Mutual Life Insurance Com¬ 
pany, Washington, D. C. 

Griffin, Harold D., Nebraska State Teachers College, Wayne, Nebr. 

Hamilton, Pf.of. Thos. R., Texas Agricultural and Mechanical College, College Station, 
Texas. 

Hammond, H. Pierson, Travelers Insurance Company, Hartford, Conn. 

Harcourt, C. J., Assistant Vice President, New York Telephone Company, New York 
City. 

Harmon, G. E., M. D., Board of Health, Chicago, Ill. 

Henderson, Robert, Vice President and Actuary, Equitable Life Assurance Society, 
New York City. 

Hendricks, Walter A., Junior Biologist, United States Animal Husbandry Experimental 
Station, Beltsville, Maryland. 

Henry, Malcolm H., Mathematics Department, Michigan State College, East Lansing, 
Mich. 

Hildebrandt, Prof. E. H. C., State Teachers College, Upper Montclair, N. J. 

Hill, M. A., Jr., Mathematics Department, University of North Carolina, Chapel Hill, 
N. C. 

Hotelling, Prof. Harold, Department of Economics, Columbia University, New York 
City. 

Huntington, Prof. Earl V., 48 Highland Street, Cambridge, Mass. 

Ingraham, Prof. Mark H., University of Wisconsin, Madison, Wise. 

Keffer, Ralph, Associate Actuary, Aetna Life Insurance Company, Hartford, Conn. 

Kelley, Prof. Truman L., Harvard University, Cambridge, Mass. 

Kellogg, Lester S., College of Commerce and Administration, Ohio State University, 
Columbus, Ohio. 

Kozelka, Richard L., School of Business, University of Minnesota, Minneapolis, Minn. 

Kullback, Solomon, 53118th Street, N.W., Washington, D. C. 

Larsen, Miss Olga, Florida State College for Women, Tallahassee, Fla. 

Larson, Harold D., Mathematics Department, University of New Mexico, Albuquerque, 
N. M. 

Leavens, Dickson II., Harvard Business School, Soldiers Field, Boston, Mass. 

MacKinnon, Joseph C., Registrar, Massachusetts Institute of Technology, Cambridge, 
Mass. 

Mahalanobis, Prof. P. C., 210 Cornwallis Street, Calcutta, India. 

Malzberg, Dr. Benj., Senior Statistician, New York State Department of Mental Hygiene, 
Albany, N. Y. 

McEwen, Geo. F., Schripps Institute of Oceanography, University of California, La 
Jolla, Calif. 

Miner, Dr. John Rice, School of Hygiene and Public Health, Johns Hopkins University, 
Baltimore, Md. 

Molina, Edward C., Switching Theory Engineer, Bell Telephone Laboratories, New York 
City. 

Mowbray, Prof. Albert H., University of California, Berkeley, Calif. 

Mudgett, Prof. Bruce D., School of Business, University of Minnesota, Minneapolis, 
Minn. 

Neifeld, Dr. Norris R., 883 Brooklyn Avenue, Brooklyn, N. Y. 

Ness, Prof. Marie M., 127 Medical Sciences Building, University of Minnesota, Min¬ 
neapolis, Minn. 

Nyswander, Prof. James A., University of Michigan, Ann Arbor, Mich. 

Olivero, Jose Bosch, Apartado no. 991, Barcelona, Spain. 



230 


DIRECTORY OF SUBSCRIBERS 


O’Toole, Dr. A. L., College of St. Catherine, St. Paul, Minn. 

Owen, Dr. F. V., United States Department of Agriculture, Salt Lake City, Utah. 

Payne, Dr. Charles K., Washington Square College, New York University, New York 
City. 

Pretorius, S. J., Prima, Stellenbosch, South Africa. 

Regan, Dr. Francis, St. Louis University, St. Louis, Mo. 

Rice, Dr. J. Nelson, 332613th Street, N.E., Washington, D. C. 

Rider, Dr. Paul R., University College, Gower Street, London, W. C. 1, England. 
Rietz, Prof. H. L., University of Iowa, Iowa City, Iowa. 

Rietz, Dr. Wm., Scientific Research Division, United States Public Health Service, 
Washington, D. C. 

Robb, Richard A., 27 Moor Road, Eaglesham, Renfrewshire, Scotland. 

Roberts, John L., 28 Boody Street, Brunswick, Maine. 

Rohm, JohnT., Actuary, American Life Insurance Company, Detroit, Mich. 

Rorty, Col. Malcolm C., Old Spout Farm, Lusby, Calvert County, Maryland. 

Ross, Dr. Frank A., 405 Fayerweather Hall, Columbia University, New York City. 
Rowell, Miss Dorothy C., 91 Bishop Street, New Haven, Conn. 

Rulon, Philip J., 4 Emerson Hall, Harvard University, Cambridge, Mass. 

Saffian, Miss Sadie, Room 205,330 S. 9th Street, Philadelphia, Penn. 

Schultz, Prof. Henry, Department of Economics, University of Chicago, Chicago, Ill. 
Schumacher, Francis X., United States Forest Service, Washington, D. C. 

Seeley, Burton D., Resettlement Administration, Washington, D. C. 

Shelton, W. Arthur, United States Department of Agriculture, Bureau of Public Roads, 
Washington, D. C. 

Shewhart, Dr. Walter A., Bell Telephone Laboratories, New York City. 

Skonberg, CarlM., Builders and Manufacturers Mutual Casualty Company, Chicago, Ill. 
Smith, Hartley Le Huray, 500 Kent Avenue, Brooklyn, N. Y. 

Souto, Dr. Jose Barral, Cordoba, 1535, Buenos Aires, Argentine. 

Springer, William M., Bristol-Myers Company, Research Department, Hillside, N. J. 
Stephan, Frederick F., 722 Woodward Building, Washington, D. C. 

Taitel, Martin, 130 Summers Drive, Alexandria, Va. 

Tao, L. K., Institute of Social Sciences, 3 Wen Tsin Chieh, Hsi An Men, Peiping, China. 
Toops, Prof. Herbert A., Department of Psychology, Ohio State University, Columbus, 
Ohio. 

Topic, F., 11 Narodni, Prague, Czechoslovakia. 

Treanor, Prof. Glen R., School of Business Administration, University of Minnesota, 
Minneapolis, Minn. 

Treloar, Alan E., Biometric Laboratory, University of Minnesota, Minneapolis, Minn. 
Trzaska, Evert Michalski, Krakowskie Przidiescie 13, Ksiegarniaw Warszawie, Poland. 
Tryon, Frederick G., Mineral Economist, Bureau of Mines, Washington, D. C. 

Waite, Dr. Warren C., Director of Agricultural Economics, University Farm, St. Paul, 
Minn. 

Walker, Prof. Helen M., Teachers College, Columbia University, New York City. 
Weida, Prof. Frank M., George Washington University, Washington, D. C. 

Wetmore, Raymond E., Bureau of Business Research, Metropolitan Life Insurance 
Company, New York City. 

Wicksell, Prof. S. D., Statistiska Institutionen vid Lunds Universitet, Lund, Sweden. 
Winsor, Charles P., 123 Holden Green, Cambridge, Mass. 

Wright, Dr. Sewall, Department of Zoology, University of Chicago, Chicago, Ill. 
Yntema, Dr. Theodore O., University of Chicago, Chicago, Ill. 

Zoch, Richmond T., United States Weather Bureau, Washington, D. C. 



DIRECTORY OF SUBSCRIBERS 


231 


LIBRARIES 

Agnes Scott Library, Decatur, Ga. 

Aktiebolaget Nordiska Bokhandeln, Drottninggatan 7, Stockholm, Sweden. 

Antioch College Library, Yellow Springs, Ohio. 

Armour Institute of Technology Library, Chicago, Ill. 

Bell Telephone Laboratories, Technical Library, 463 West Street, New York City. 

Bibli. de la Faculdad de Derecho, Universidad Literaria, Valencia, Spain. 

Bombay University, Bombay, India. 

Brown University Library, Providence, R. I. 

Buchhandlung des Oesterreichischen Bundesverlages, Schwarzenbergstr. 5, Wien 1, Austria. 
Bureau of Statistics, Ministry of Industries, Nanking, China. 

Carnegie Institute of Technology, Schenlcy Park, Pittsburg, Penn. 

California Institute of Technology, Mathematics Library, Pasadena, Calif. 

Columbia University, New York, N. Y. 

Connecticut College, New London, Conn. 

Dartmouth College, Hanover, N. H. 

Dominion Bureau of Statistics, Sussex Street, Ottawa, Canada. 

Fordham University Graduate School, Woolworth Building, New York, N. Y. 

Fundament Bibli., Lesotechnitsch Akademii, Leningrad 18, U. S. S. R. 

Fund. IB-KA Sredne-Asiaisk, Gisud. Uniwersiteta, Potschtowi J Jastschik No. 47, Tasch- 
kent, U. S. S. R. 

Giannini Foundation Library, University of California, Berkeley, Calif. 

Goucher College, Baltimore, Md. 

Guthrie and Co., Ltd., Java Street, Kuala Lumpur, Federated Malay States. 

Hamilton Smith College, University of New Hampshire, Durham, N. H. 

Harvard College Library, Cambridge, Mass. 

Hawaiian Sugar Planters Association, Experimental Station, Honolulu, Hawaii. 
Hirschwald8che Buchhandlung, Unter den Linden 68, Berlin, N.W. 7, Germany. 

Imperial Institute of Agricultural Research, Pusa, Bihar, India. 

Indiana University, Bloomington, Ind. 

Institute of Social Sciences, Academia Sinica, Peiping, China. 

Iowa State College, Ames, Iowa. 

Jackson, Son and Company, 73 West George Street, Glasgow, Scotland. 

Lehigh University Library, Bethlehem, Penn. 

Lehman Corp., 1 So. William St., New York, N. Y. 

Library of Congress, Washington, D. C. 

Libreria Victoriano Suarez, Biblioteca de Estadistica, Madrid, Spain. 

Maharajah’s College, Mysore, India. 

Mandelle Memorial Library, Kalamazoo College, Kalamazoo, Mich. 

Mathamatisches Inst, der Univ., Bunsenstr. 3-5, Gottingen, Germany. 

Michigan State College, East Lansing, Mich. 

Montana State College, Bozeman, Montana. 

Nankai University, Tientsin, China. 

National Resources Com., 2 San Yuan Hong, Nanking, Kiangsu, China. 

National Research Council, Sussex Street, Ottawa, Canada. 

Nautschn Bib-Ka Zunchu, B. Wusowskij per 2, Moskwa, U. S. S. R. 

New York Public Library, 5th Avenue and 42nd Street, New York City. 

New York State College of Agriculture, Agricultural Economics and Farm Management 
Department, Ithaca, N. Y. 

North Dakota State College, Fargo, North Dakota. 

Northwestern University, Evanston, Ill. 



232 


\ 

DIRECTORY OF SUBSCRIBERS 

Ohio State University, Columbus, Ohio. 

Peiping Union Medical College, Peiping, China. 

Porter Library, Kansas State Teachers College, Pittsburg, Kansas. 

Post Biblioteki, Akademii Nauk U.S.S.R, Bolsaja Koluzskaja, 24, Moskwa, U.S.S.R. 
Presidency College, Statistical Laboratory, Calcutta, India. 

Purdue University, Lafayette, Indiana. 

Rensselaer Polytechnic Institute, Troy, N. Y. 

Rockefeller Foundation, 49 West 49th Street, New York City. 

Royal Statistical Society, 9 Adelphi Terrace, London, W.C. 2, England. 

Smith College, Northampton, Mass. 

Southern Forest Experimental Station, 400 Union Building, New Orleans, La. 

Stadt Bibliothek, Schone Aussicht 2, Frankfurt A. M., Germany. 

Stanford University, Stanford, Calif. J 

Statens Skogsforsoksanstalt Biblioteket, Experimentalf&ltet, Sweden. 

Tsing Hua University, Peiping, China. 

United States Department of Agriculture, Washington, D. C. , 

Cniversita Commerciale Luigi Bocconi, Via Solferino, Milano, Italy. 

♦University of Arkansas, Engineering Library, Fayetteville, Arkansas. 

University of Buffalo, Edmund Hayes Hall, Buffalo, N. Y. ^ 

University of California at Los Angeles, Mathematics Library, Los Angeles, Calif. 
University of California at Los Angeles, University Library, Los Angeles, Calif. 

University of California, Berkeley, Calif. • 

University of Chicago, Harper Memorial Library, Chicago, Ill. 

University of Cincinnati, Burnet Woods Park, Cincinnati, Ohio. 

University of Illinois, Urbana, Ill. 

University of Iowa, Iowa City, Iowa. 

University of Kansas, Periodical Department, Lawrence, Kansas. 

University of Kentucky, Lexington, Ky. 

University of London, Department of Applied Statistics, University College, London 
England. 

University of Manitoba, Fort Garry Site, Winnipeg, Canada. 

University of Maryland, College Park, Maryland. 

University of Michigan, Ann Arbor, Michigan. 

University of Minnesota, Minneapolis, Minn. ,y ) 

University of Nebraska, Lincoln, Nebr. * 

University of Oklahoma, Norman, Okla. 

University of Oregon, Eugene, Oregon. 

University of Pennsylvania, Philadelphia, Penn. 

University of Pittsburg, 310 State Hall, Pittsburg, Penn. 

University of Pretoria, Pretoria, South Africa. 

University of Rochester, Rochester, N. Y. 

University of Texas, Austin, Texas. 

University of Toronto, Toronto 5, Canada. 

University of Washington, Seattle, Washington. 

Washington University, St. Louis, Mo. 

Wesleyan University, Middletown, Conn. 

Western Reserve University, Cleveland, Ohio. 

Williams College, Williamstown, Mass. 

Yale University, New Haven, Conn. 

Zentr. Nautschnaja s/ch Biblioteka, Orlikow per 1/11, Moskwa, U. S. S. R. 

Zentr alblatt-Bureau, Julius Springer, Linkstrasse 23-24, Berlin, Germany. > 





Date of tan. Date of tap. | Dato of issue 



